gmantele / vollt

Java libraries implementing the IVOA protocol: ADQL, UWS and TAP
http://cdsportal.u-strasbg.fr/taptuto/
29 stars 28 forks source link

ADQLParser is not thread safe #128

Open vforchi opened 3 years ago

vforchi commented 3 years ago

I noticed that the ADQLParser is not thread safe: I was reusing the same instance and noticed that the generated queries were incorrect. This is not necessarily a bug, but it should be mentioned in the documentation.

gmantele commented 3 years ago

I am actually quite surprised it is not thread safe currently. It should be, or at least I wanted it to be thread safe. I will have to investigate that.

Do you have lead on what was kept from one call to another with the same instance? I could probably find out, but if you already have an idea, it would spare me time :-).

vforchi commented 3 years ago

No, I just noticed that the generated queries were wrong and that I had a shared object.

gmantele commented 3 years ago

It is good to know that now. The parser has to be adapted anyway in order to support both ADQL-2.0 and -2.1. So, I will certainly look into this issue before releasing this new version of vollt/ADQL.

theresadower commented 3 years ago

We have recently received a report at MAST from Brian van Klaveran at SLAC that some sync TAP queries are throwing errors that look like a parser threading issue. Sync queries to MAST's TAP services directly call my fork of the ADQL library with additional MAST-specific geometry support, but the issue is happening with queries that are not using the additional MAST-specific translation.

From the report:

I am able to reproduce this with the following script (doing so from California, in case latency makes any kind of difference):

#!/bin/bash
curl -s -o test1.table "https://vao.stsci.edu/CAOMTAP/TapService.aspx/sync?REQUEST=doQuery&LANG=ADQL&QUERY=SELECT+*+FROM+TAP_SCHEMA.tables+WHERE+schema_name+like+'ivoa'" &
curl -s -o test2.table "https://vao.stsci.edu/CAOMTAP/TapService.aspx/sync?REQUEST=doQuery&LANG=ADQL&QUERY=SELECT+*+FROM+TAP_SCHEMA.tables+WHERE+schema_name+like+'ivoa'" &
curl -s -o test3.table "https://vao.stsci.edu/CAOMTAP/TapService.aspx/sync?REQUEST=doQuery&LANG=ADQL&QUERY=SELECT+*+FROM+TAP_SCHEMA.tables+WHERE+schema_name+like+'ivoa'" &
curl -s -o test4.table "https://vao.stsci.edu/CAOMTAP/TapService.aspx/sync?REQUEST=doQuery&LANG=ADQL&QUERY=SELECT+*+FROM+TAP_SCHEMA.tables+WHERE+schema_name+like+'ivoa'" &
sleep 2
diff -q test1.table test2.table
diff -q test2.table test3.table
diff -q test3.table test4.table

The errors tend to vary, but it sounds like the service sometimes is processing a garbled query - possibly some kind of thread safety issue with the ADQL parser?
gmantele commented 3 years ago

Very interesting...and very surprising (because I never experienced this error). Thank you @theresadower for this small script. I succeeded to reproduce this issue.

theresadower commented 3 years ago

@gmantele you're welcome! There is still the very real possibility our error is in a different place, but it seemed a useful starting point, and may be able to be successfully tested against other services.

gmantele commented 3 years ago

After a quick look into all of this, I already have a bit of an answer:

gmantele commented 3 years ago

@theresadower , you should also have a stack-trace somewhere in your log file related to such error (i.e. prefixed by ADQL Translation Error). If you can find any and share it with me, it would be very helpful to understand the problem.

brianv0 commented 3 years ago

Just a comment, since the code is in Java, you could wrap the parser entry with an ReentrantLock as a quick fix - somewhere around the parseQuery's or reuse ReInit (to do the lock) in ADQLParser.java

gmantele commented 3 years ago

FYI, I just tried this script against 2 other services using VOLLT/TAP-Lib (with Postgres+PgSphere) and no problem at all. So I suspect something going wrong with your translator or the way it is created/provided....but knowing so few for the moment, I may be wrong. I continue my investigations :-)

theresadower commented 3 years ago

Ah, yes, we are cacheing and reusing parser/translator objects at some level, having assumed thread safety. This was done because building the full schema and geometry and UDF awareness is expensive so I wanted to avoid doing so per query. I should be able to move the level at which we're cacheing these data structures to avoid this.

gmantele commented 3 years ago

Just a comment, since the code is in Java, you could wrap the parser entry with an ReentrantLock as a quick fix - somewhere around the parseQuery's or reuse ReInit (to do the lock) in ADQLParser.java

Good idea. However, there are other functions (parsing functions or not) able to change anything in the class that may be used during the parsing process. Then, it would mean putting locks or synchronized at a lot of different places in the ADQLParser, which could lead to undesired lock side effects. I keep the idea though...it might be helpful somewhere else in the parsing or TAP requests processing.

gmantele commented 3 years ago

Ah, yes, we are cacheing and reusing parser/translator objects at some level, having assumed thread safety. This was done because building the full schema and geometry and UDF awareness is expensive so I wanted to avoid doing so per query. I should be able to move the level at which we're cacheing these data structures to avoid this.

Links between the ADQL items such as the column and table references and their corresponding TAP_SCHEMA counterparts is already done by DBChecker. So, at translation time, you should have all the TAP_SCHEMA metadata you need attached to the element to translate.

About UDF, depending on the way you declare and use them, such information may already be available during the translation as well.

_Spoiler alert: in the coming version of TAP, UDF management will be highly easier as there will be the possibility to give an SQL translation pattern (so a simple string like my_udf_in_sql($1, $2)) in tap.properties which would avoid writing a Java class as required now._

And finally, information like allowed geometries and UDFs are already shared to all threads through ServiceConnection (only one instance for all threads).

Generally speaking, having a link with ServiceConnection will most of the time resolve your problem about accessing TAP_SCHEMA metadata and configuration.

Since this issue is now going a bit off topic, @theresadower , don't hesitate to contact me directly by email if you need help for these modifications.