astropy / pyvo

An Astropy affiliated package providing access to remote data and services of the Virtual Observatory (VO) using Python.
https://pyvo.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
75 stars 52 forks source link

Improved error handling #125

Open keflavich opened 5 years ago

keflavich commented 5 years ago

Some error messages are very opaque. It's not clear whether this is a server-side or client-side issue, but if it's the latter, we should fix them. For example, this error:

DALQueryError: Cannot parse query 'SELECT * FROM user_jsalgado.m45' for job '1558013349747O': 1 unresolved identifiers: m45 [l.1 c.15 - l.1 c.32] !

is actually an authentication error: the requested table was not accessible because it's private and no proper token was provided.

It would be helpful to add examples to this issue to illustrate where better error handling is needed.

msdemlei commented 5 years ago

On Thu, May 16, 2019 at 08:03:07AM -0700, Adam Ginsburg wrote:

Some error messages are very opaque. It's not clear whether this is a server-side or client-side issue, but if it's the latter, we should fix them. For example, this error:


DALQueryError: Cannot parse query 'SELECT * FROM user_jsalgado.m45' for job '1558013349747O': 1 unresolved identifiers: m45 [l.1 c.15 - l.1 c.32] !
is actually an authentication error: the requested table was not
accessible because it's private and no proper token was provided.

This is just handing through an error from the service itself. Clearly, this could be more helpful, but there's little pyVO can do about this.

In this particular case, the service operator could be advised to try and catch that this actually is an authorisation problem -- but then again, it might be argued that even letting unauthorised users explore namespaces of authorised ones is an information leak you may want to avoid. Not that that's my position, but, say, jsalgado might object to letting people snoop out whether he's working on m45 or not.

It would be helpful to add examples to this issue to illustrate where better error handling is needed.

Well, I'd say the server could be more helpful by stating "could not locate table 'user_jsalgado.m45'" That's also relatively easy to do for the ADQL -> SQL translator (DaCHS does this; try "select * from foo.bar" on http://dc.g-vo.org/tap). But that should probably be invariant against whether the table is inaccessible because of lacking authorisation or because the table really doesn't exist.

Be that as it may, pyVO can't do better than hand through the messages from the server (though perhaps it might better mark them as server-generated). It's hard enough for the servers to provide reasonable error messages, and they at least have some remote idea of what the have and don't have.

     -- Markus
keflavich commented 5 years ago

Agreed on all counts: pyVO should report that this is a server side error, and the server should provide a more informative error message.

There was a subtext to this message that I didn't express - someone else today (don't know github username) expressed that there were some inadequate error messages in pyvo, but I don't know what they are. I was hoping this issue would prompt some other users to post their concerns.

bsipocz commented 9 months ago

Unfortunately we still see some cryptically unuseful error message, e.g. this one was due to the missing upload file yet the message complains about the query:

https://github.com/NASA-NAVO/navo-workshop/issues/146

msdemlei commented 9 months ago

On Fri, Dec 15, 2023 at 04:04:09PM -0800, Brigitta Sipőcz wrote:

Unfortunately we still see some cryptically unuseful error message, e.g. this one was due to the missing upload file yet the message complains about the query:

https://github.com/NASA-NAVO/navo-workshop/issues/146

I'm not sure about this one, but the #146 message is hard to fix unless we take a lot more baggage into pyVO (and then there are other downsides).

The trouble is that the error message comes from the remote service, rather than from us. Handing through the error messages from the remote service certainly is something that can be improved, but if "Error[s] detected in the query preprocessing" is all the service returns, there is little we can do.

To diagnose the error locally and before sending off the query, we would have to do the following:

(1) parse the ADQL (2) find out that there are uploaded table(s) referenced (3) find out which columns from these are referenced in the query (4) make sure these columns actually are in the tables uploaded.

Most of this isn't really hard -- in particular, I'd donate my ADQL parser --, but it's tricky in so many details; let me just mention that we may need to parse the uploads if people pass in files, and that again may be trouble, at the latest the moment services support non-VOTable uploads.

Also, there would definitely need to be a way to switch off that apparatus; we don't want to reject queries just because they use some local extension on the server. Me, I've very often be glad that TOPCAT lets you send off queries even it it thinks they're wrong.

Given all that, I think what we ought to do is make sure that all diagnostics coming from the server are easily accessible to our users -- and then perhaps tell them that if they don't know what to do with a message they should contact the service's contact address. Based on the access URL, we should be able to figure that out for registered services (I'm happy to contribute code if you agree).

bsipocz commented 9 months ago

(2) find out that there are uploaded table(s) referenced (3) find out which columns from these are referenced in the query (4) make sure these columns actually are in the tables uploaded.

I would only go as far as checking whether the uploaded table exists at all, parsing or interpreting it and the ADQL would be beyond scope I believe, we should leave that to the service and hope they return a reasonable error message. I'm surprised that no one actually complained about that, I should have done a deep dive in the traceback.

msdemlei commented 9 months ago

On Mon, Dec 18, 2023 at 10:23:27AM -0800, Brigitta Sipőcz wrote:

I would only go as far as checking whether the uploaded table exists at all, parsing or interpreting it and the ADQL would be

Even that is not altogehter trivial. As a 95% solution, we could do re.findall(r"(?i)tap_schema.([a-z_0-9]+)", query), but that can have false positives (tap_schema in string literals, say). Thanks to DALI 1.2, 3.4.5 ("Resource names must be simple strings made up of alphabetic, numeric, and the underscore characters only and must start with an alphabetic character"), there at least wouldn't be false negatives.

So... sounds like a reasonable plan. Volunteers?