gmantele / vollt

Java libraries implementing the IVOA protocol: ADQL, UWS and TAP
http://cdsportal.u-strasbg.fr/taptuto/
29 stars 28 forks source link

ADQL "spell-checker" #104

Closed gmantele closed 5 years ago

gmantele commented 5 years ago

Add a spell-checker/auto-correction/suggestion function for a whole ADQL query. This function will not be part of the existing parsing function. It would be an additional one: in case where the parsing fails, one could try this "spell-checking" functions in order to fix the easy common mistakes and then, try again to parse and submit the query.

Thus, non-regular identifiers could be detected and automatically double-quoted ; especially terms like public and distance.

It would also be possible to suggest correction of column and table names if badly written.

gmantele commented 5 years ago

In the commit 15cd5944f247b1463bbe6ef71399b937e1e60299, a new function has been added in ADQLParser: tryQuickFix(String): String.

It fixes some Unicode confusable characters (generally coming from copy-paste from a PDF document) and it double quotes SQL reserved words (e.g. public, date, year, user), ADQL function names used as identifiers (e.g. distance, min, avg, point) and most of the invalid regular identifiers (e.g. _raj2000, 2mass). Correction of invalid regular identifiers is not perfect (see the commit for more details).

For instance, here is an ADQL query that any user would want to run but whose the parsing will immediately fail because of the starting _, the distance (which is reserved to an ADQL function) and public (which is a reserved SQL word):

SELECT id, _raj2000, _dej2000, distance
FROM public.myTable

Applying the function tryQuickFix(...) will produce the following query:

SELECT id, "_raj2000", "_dej2000", "distance"
FROM "public".myTable

This query should now run (if the case is correct for the column and schema names...but that is not tested by this function ; the user will still have to check that by himself).

Next to add in TAP-Lib: apply this fix function when the parsing of a query fails. This will be an optional feature which will be disabled by default.

gmantele commented 5 years ago

It is now possible to enable an automatic fix of input ADQL query in TAP-Lib, through the configuration file's property fix_on_fail. When enabled, and only if the parsing of the input query fails, TAP-Lib will try a quick fix on this query, then parses the fixed query and if it passes, finally run the query.

When a query is fixed in this way, TAP-Lib will log it and will add an INFO element in the output VOTable. This INFO element, named QUERY_AFTER_AUTO_FIX will be set to the result of the auto fix, and so, to the ADQL query really executed.

In anyway, this feature will always be disabled by default if omitted in the configuration file.