astropy / pyvo

An Astropy affiliated package providing access to remote data and services of the Virtual Observatory (VO) using Python.
https://pyvo.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
77 stars 52 forks source link

Bug: Bad table from IPAC #8

Closed keflavich closed 11 years ago

keflavich commented 11 years ago
import pyvo as vo
services = vo.regsearch(servicetype="sia", waveband='infrared')
url = services[0].accessurl
print url
# http://irsa.ipac.caltech.edu/cgi-bin/Atlas/nph-atlas?mission=EIGA&hdr_location=%5CEIGADataPath%5C&SIAP_ACTIVE=1&collection_desc=The+Extended+IRAS+Galaxy+Atlas+%28EIGA%29&
# search around OMC 1:
vo.imagesearch('http://irsa.ipac.caltech.edu/cgi-bin/Atlas/nph-atlas?mission=EIGA&hdr_location=%5CEIGADataPath%5C&SIAP_ACTIVE=1&collection_desc=The+Extended+IRAS+Galaxy+Atlas+%28EIGA%29&', (83.8083, -5.3733), 0.1)

Returns this error:

ERROR: DALFormatError: <class 'astropy.io.votable.exceptions.E19: dal_query:2:0: E19: File does not appear to be a VOTABLE [pyvo.dal.query]
Traceback (most recent call last):
  File "<ipython-input-46-779c7f0981d0>", line 1, in <module>
    vo.imagesearch('http://irsa.ipac.caltech.edu/cgi-bin/Atlas/nph-atlas?mission=EIGA&hdr_location=%5CEIGADataPath%5C&SIAP_ACTIVE=1&collection_desc=The+Extended+IRAS+Galaxy+Atlas+%28EIGA%29&', (83.8083, -5.3733), 0.1)
  File "/Users/adam/repos/pyvo/pyvo/dal/sia.py", line 75, in search
    return service.search(pos, size, format, intersect, verbosity, **keywords)
  File "/Users/adam/repos/pyvo/pyvo/dal/sia.py", line 138, in search
    return q.execute()
  File "/Users/adam/repos/pyvo/pyvo/dal/sia.py", line 383, in execute
    return SIAResults(self.execute_votable(), self.getqueryurl())
  File "/Users/adam/repos/pyvo/pyvo/dal/query.py", line 276, in execute_votable
    protocol=self.protocol, version=self.version)
DALFormatError: <class 'astropy.io.votable.exceptions.E19: dal_query:2:0: E19: File does not appear to be a VOTABLE

The same query worked fine for, e.g., skyview.

RayPlante commented 11 years ago

As I hope will become clear from the forthcoming manual, failures such as shown above are not due to pyvo but with the service at, in this case, IPAC. By catching exceptions, you can handle such failures. Here is an example in which we query a number of services discovered from the registry, some of which work fine, others, not so well:

# Create a catalog of available x-ray images (saved as a CSV file)

import pyvo as vo

# find archives with x-ray images
archives = vo.regsearch(servicetype='image', waveband='xray')
# position of my favorite source
pos = vo.object2pos('Cas A')

# find images and list in a file
with open('cas-a.csv', 'w') as csv:
    print >> csv, "Archive short name,Archive title,Image title,RA,Dec,URL"
    for arch in archives:
        print "searching %s..." % arch.shortname
        try:
             matches = arch.search(pos=pos, size=0.25)
        except vo.DALAccessError, ex:
             print "Trouble accessing %s archive (%s)"  % (arch.shortname, str(ex))
             continue
        print "...found %d images" % matches.nrecs
        for image in matches:
             print >> csv, ','.join( (arch.shortname, arch.title, image.title, 
                                      str(image.ra), str(image.dec), image.getdataurl()) )

This captures failures of any sort while querying a particular service via the vo.DALAccessError. When you run this script you should see something like the following:

searching ROSAT SIA...
WARNING: W22: dal_query:3:2: W22: The DEFINITIONS element is deprecated in VOTable 1.1.  Ignoring [astropy.io.votable.exceptions]
WARNING: W01: dal_query:40:16: W01: Array uses commas rather than whitespace [astropy.io.votable.exceptions]
(9 more warnings)
...found 82 images
searching NED(images)...
Trouble accessing NED(images) archive (<class 'astropy.io.votable.exceptions.E19: dal_query:2:0: E19: File does not appear to be a VOTABLE)
searching SkyView...
...found 215 images
searching HEAVENS @ ISDC...
WARNING: W50: dal_query:41:6: W50: Invalid unit string 'bytes' [astropy.io.votable.exceptions]
...found 4 images
searching TGCat SIA...
WARNING: W50: dal_query:20:0: W50: Invalid unit string 'degree' [astropy.io.votable.exceptions]
(4 more warnings
...found 9 images
searching RASS.25keV [1]...
...found 0 images
searching Chandra...
...found 400 images
searching RASSBCK [1]...
...found 0 images
searching BATSIG [1]...
...found 0 images
searching GRANAT [1]...
...found 0 images
searching HEAO1A [1]...
...found 0 images
searching HRI [1]...
...found 5 images
searching INTEGRALSPI_gc [1]...
...found 0 images
searching INTGAL [1]...
...found 0 images
searching PSPC1 [1]...
...found 5 images
searching PSPC2 [1]...
...found 5 images
searching PSPC6 [1]...
...found 0 images
searching RASSALL [1]...
...found 15 images
searching RXTE [1]...
...found 0 images

You will notice that some returned a few images, some none, some produced warnings about format errors but still returned useable results, and the NED service failed completely.

If you have suggests for more convenient ways to deal with such failures, do let us know!

hope this helps, Ray

keflavich commented 11 years ago

This is an issue we need to deal with in astroquery too, so it's something I would like to investigate further. Do you have any idea what is being returned incorrectly or why? What is the most straightforward way to find out?

The exception that is raised is OK, as it can be dealt with, but it is not ideal, in that it doesn't tell you how to fix the problem if, say, the data actually exists but the service isn't returning it (or how to find out if the message actually means "no data found").

Is it safe to assume that some queries to, e.g., the NED images archive in your example, will work, but it just happens that this one failed?

RayPlante commented 11 years ago

We have several exceptions that differentiate between the different types of errors:

DALQueryError usually means that the service could return results with the proper inputs. With DALServiceError, the service may just be down temporarily. The other errors indicate a real problem with the service that doesn't depend on the inputs.

One way to deal with such problems that might be more convenient in an interactive context (as opposed to from within a script) is to have an optional parameter to not throw an exception on error but rather some benign response (None or an empty result set). Have you guys had thoughts on this?

keflavich commented 11 years ago

So far our exception cases cover "timeout" and that's about it; other exceptions we have treated as "we didn't parse the return properly". I think the "no matches found" case returns an empty table; it may be better to raise an exception for that case.

For the interactive case, though, instead of raising the "This is not a votable" error, you could have an option "return raw result if it did not parse as a a table". In astroquery, we're implementing this by having different routines; at the lowest level, it is possible to directly request the raw response (whatever the website spits out in text form), but the default action for, say, get_images is to return FITS objects. query_region returns tables, like pyvo.

RayPlante commented 11 years ago

it is possible to directly request the raw response (whatever the website spits out in text form),

Inside the simple search functions are query objects (e.g. pyvo.sia.SIAQuery) which give a bit more control, which is particularly useful for debugging. For example:

import pyvo as vo
nvss = "http://skyview.gsfc.nasa.gov/cgi-bin/vo/sia.pl?survey=nvss&"
query = vo.sia.SIAQuery(nvss)
query.size = 0.2                 # degrees
query.format = 'image/fits'
query.pos = vo.object2pos('m51')

print query.getqueryurl()
res = query.execute_stream()

query.getqueryurl() prints out the actual URL it uses to make the query, and query.execute_stream() returns a file object for reading the raw results. It will raise a DALServiceError if there is an HTTP error, but otherwise you get what comes back. query.execute_votable() returns just the astropy votable instance (not wrapped in our result class). This is not quite what you suggested, but it is good for debugging.