Open vforchi opened 3 years ago
I am actually quite surprised it is not thread safe currently. It should be, or at least I wanted it to be thread safe. I will have to investigate that.
Do you have lead on what was kept from one call to another with the same instance? I could probably find out, but if you already have an idea, it would spare me time :-).
No, I just noticed that the generated queries were wrong and that I had a shared object.
It is good to know that now. The parser has to be adapted anyway in order to support both ADQL-2.0 and -2.1. So, I will certainly look into this issue before releasing this new version of vollt/ADQL.
We have recently received a report at MAST from Brian van Klaveran at SLAC that some sync TAP queries are throwing errors that look like a parser threading issue. Sync queries to MAST's TAP services directly call my fork of the ADQL library with additional MAST-specific geometry support, but the issue is happening with queries that are not using the additional MAST-specific translation.
From the report:
I am able to reproduce this with the following script (doing so from California, in case latency makes any kind of difference):
#!/bin/bash
curl -s -o test1.table "https://vao.stsci.edu/CAOMTAP/TapService.aspx/sync?REQUEST=doQuery&LANG=ADQL&QUERY=SELECT+*+FROM+TAP_SCHEMA.tables+WHERE+schema_name+like+'ivoa'" &
curl -s -o test2.table "https://vao.stsci.edu/CAOMTAP/TapService.aspx/sync?REQUEST=doQuery&LANG=ADQL&QUERY=SELECT+*+FROM+TAP_SCHEMA.tables+WHERE+schema_name+like+'ivoa'" &
curl -s -o test3.table "https://vao.stsci.edu/CAOMTAP/TapService.aspx/sync?REQUEST=doQuery&LANG=ADQL&QUERY=SELECT+*+FROM+TAP_SCHEMA.tables+WHERE+schema_name+like+'ivoa'" &
curl -s -o test4.table "https://vao.stsci.edu/CAOMTAP/TapService.aspx/sync?REQUEST=doQuery&LANG=ADQL&QUERY=SELECT+*+FROM+TAP_SCHEMA.tables+WHERE+schema_name+like+'ivoa'" &
sleep 2
diff -q test1.table test2.table
diff -q test2.table test3.table
diff -q test3.table test4.table
The errors tend to vary, but it sounds like the service sometimes is processing a garbled query - possibly some kind of thread safety issue with the ADQL parser?
Very interesting...and very surprising (because I never experienced this error). Thank you @theresadower for this small script. I succeeded to reproduce this issue.
@gmantele you're welcome! There is still the very real possibility our error is in a different place, but it seemed a useful starting point, and may be able to be successfully tested against other services.
After a quick look into all of this, I already have a bit of an answer:
Indeed, ADQLParser
is not thread-safe, and on the contrary to what I said before, it can not be otherwise. The parsing needs several variables (managed by JavaCC) which depends on the current parsing state. So that will never be different and I was aware of that while using it for running queries in TAP. @vforchi , as you suggested, I have to write somewhere in the Javadoc + Documentation that ADQLParser
is not thread-safe.
@theresadower , the reported issue is, as I said, very surprising. For each query (synchronous or not), a new query execution context (meaning a new ADQLParser
with a new JDBCConnection
and so ADQLTranslator
) is created and attached to a single Thread. When the query is over, this context is discarded. For the moment, as far as I can imagine, the only way there is a thread concurrency would be if the same translator/connection or parser was provided by the TAP/ADQLFactory. And even though, the ADQLTranslator
should not keep context related data ; instead of ADQLParser
it is supposed to be thread safe. How do you specify the translator to use in your TAP implementation? Do you use a tap.properties
file?
@theresadower , you should also have a stack-trace somewhere in your log file related to such error (i.e. prefixed by ADQL Translation Error
). If you can find any and share it with me, it would be very helpful to understand the problem.
Just a comment, since the code is in Java, you could wrap the parser entry with an ReentrantLock as a quick fix - somewhere around the parseQuery
's or reuse ReInit
(to do the lock) in ADQLParser.java
FYI, I just tried this script against 2 other services using VOLLT/TAP-Lib (with Postgres+PgSphere) and no problem at all. So I suspect something going wrong with your translator or the way it is created/provided....but knowing so few for the moment, I may be wrong. I continue my investigations :-)
Ah, yes, we are cacheing and reusing parser/translator objects at some level, having assumed thread safety. This was done because building the full schema and geometry and UDF awareness is expensive so I wanted to avoid doing so per query. I should be able to move the level at which we're cacheing these data structures to avoid this.
Just a comment, since the code is in Java, you could wrap the parser entry with an ReentrantLock as a quick fix - somewhere around the
parseQuery
's or reuseReInit
(to do the lock) in ADQLParser.java
Good idea. However, there are other functions (parsing functions or not) able to change anything in the class that may be used during the parsing process. Then, it would mean putting locks or synchronized
at a lot of different places in the ADQLParser
, which could lead to undesired lock side effects. I keep the idea though...it might be helpful somewhere else in the parsing or TAP requests processing.
Ah, yes, we are cacheing and reusing parser/translator objects at some level, having assumed thread safety. This was done because building the full schema and geometry and UDF awareness is expensive so I wanted to avoid doing so per query. I should be able to move the level at which we're cacheing these data structures to avoid this.
Links between the ADQL items such as the column and table references and their corresponding TAP_SCHEMA counterparts is already done by DBChecker
. So, at translation time, you should have all the TAP_SCHEMA metadata you need attached to the element to translate.
About UDF, depending on the way you declare and use them, such information may already be available during the translation as well.
_Spoiler alert: in the coming version of TAP, UDF management will be highly easier as there will be the possibility to give an SQL translation pattern (so a simple string like my_udf_in_sql($1, $2)
) in tap.properties
which would avoid writing a Java class as required now._
And finally, information like allowed geometries and UDFs are already shared to all threads through ServiceConnection
(only one instance for all threads).
Generally speaking, having a link with ServiceConnection
will most of the time resolve your problem about accessing TAP_SCHEMA metadata and configuration.
Since this issue is now going a bit off topic, @theresadower , don't hesitate to contact me directly by email if you need help for these modifications.
I noticed that the
ADQLParser
is not thread safe: I was reusing the same instance and noticed that the generated queries were incorrect. This is not necessarily a bug, but it should be mentioned in the documentation.