astropy / pyvo

An Astropy affiliated package providing access to remote data and services of the Virtual Observatory (VO) using Python.
https://pyvo.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
77 stars 52 forks source link

BUG: unable to search SIA with no constraint but maxrec=0 #519

Open bsipocz opened 9 months ago

bsipocz commented 9 months ago

While I was trying to resolve https://github.com/astropy/astroquery/issues/2940, I run into this bug

In [60]: from pyvo.dal import SIA2Service

In [61]: sia = SIA2Service(baseurl='https://irsa.ipac.caltech.edu/SIA')

In [62]: sia.search(maxrec=0)
---------------------------------------------------------------------------
DALQueryError                             Traceback (most recent call last)
Cell In[62], line 1
----> 1 sia.search(maxrec=0)

File ~/munka/devel/worktrees/pyvo/testings/pyvo/dal/sia2.py:226, in SIA2Service.search(self, pos, band, time, pol, field_of_view, spatial_resolution, spectral_resolving_power, exptime, timeres, publisher_did, facility, collection, instrument, data_type, calib_level, target_name, res_format, maxrec, **kwargs)
    201 def search(self, pos=None, *, band=None, time=None, pol=None,
    202            field_of_view=None, spatial_resolution=None,
    203            spectral_resolving_power=None, exptime=None,
    204            timeres=None, publisher_did=None, facility=None, collection=None,
    205            instrument=None, data_type=None, calib_level=None,
    206            target_name=None, res_format=None, maxrec=None, **kwargs):
    207     """
    208     Performs a SIA2 search against a SIA2 service
    209 
   (...)
    213 
    214     """
    215     return SIA2Query(self.query_ep, pos=pos, band=band,
    216                      time=time, pol=pol,
    217                      field_of_view=field_of_view,
    218                      spatial_resolution=spatial_resolution,
    219                      spectral_resolving_power=spectral_resolving_power,
    220                      exptime=exptime, timeres=timeres,
    221                      publisher_did=publisher_did,
    222                      facility=facility, collection=collection,
    223                      instrument=instrument, data_type=data_type,
    224                      calib_level=calib_level, target_name=target_name,
    225                      res_format=res_format, maxrec=maxrec,
--> 226                      session=self._session, **kwargs).execute()

File ~/munka/devel/worktrees/pyvo/testings/pyvo/dal/sia2.py:471, in SIA2Query.execute(self)
    457 def execute(self):
    458     """
    459     submit the query and return the results as a SIA2Results instance
    460 
   (...)
    469        for errors parsing the VOTable response
    470     """
--> 471     return SIA2Results(self.execute_votable(), url=self.queryurl, session=self._session)

File ~/munka/devel/worktrees/pyvo/testings/pyvo/dal/adhoc.py:111, in AdhocServiceResultsMixin.__init__(self, votable, url, session)
    110 def __init__(self, votable, *, url=None, session=None):
--> 111     super().__init__(votable, url=url, session=session)
    113     self._adhocservices = list(
    114         resource for resource in votable.resources
    115         if resource.type == "meta" and resource.utype == "adhoc:service"
    116     )

File ~/munka/devel/worktrees/pyvo/testings/pyvo/dal/query.py:336, in DALResults.__init__(self, votable, url, session)
    334 self._status = self._findstatus(votable)
    335 if self._status[0].lower() not in ("ok", "overflow"):
--> 336     raise DALQueryError(self._status[1], self._status[0], url)
    338 if self._status[0].lower() == "overflow":
    339     warn("Partial result set. Potential causes MAXREC, async storage space, etc.",
    340          category=DALOverflowWarning)

DALQueryError: UsageFault: BAD_REQUEST: At least one constraint (e.g. pos or collection) must be specified unless maxrec=0.
bsipocz commented 9 months ago

Never mind, problem is between keyboard and chair, the error is in fact raised by the server, and to work around it I should have searched with responseformat to be votable.

bsipocz commented 9 months ago

I'm in fact reopening this, as the maxrec=0 is not handled properly.

Also, while the suggestion for https://github.com/astropy/astroquery/issues/2940 says to do a query with responseformat=votable, I don't see any indication that the custom argument is in-fact propagated back to the query to the server. Even if it does, we need a better and easier way to expose the query for debugging reasons.

bsipocz commented 9 months ago

The immediate issue of not passing maxrec through has been solved, but we still not parse the returned metadata-only votable properly and this a maxrec=0 query returns an empty table rather than the metadata description.

msdemlei commented 9 months ago

On Thu, Feb 15, 2024 at 11:17:07PM -0800, Brigitta Sipőcz wrote:

The immediate issue of not passing maxrec through has been solved, but we still not parse the returned metadata-only votable properly and this a maxrec=0 query returns an empty table rather than the metadata description.

The question is what we're supposed to do with this information.

It was originally envisaged that clients could in this way inspect the schema of the table returned and, if and when we finally get our STC act together, frames and all that. True: that is in competition to VOSI tables (although it's even less clear how STC metadata would be represented there; but then for SIA2 that's an issue only for custom columns, as all relevant frames for standard columns are fixed by the standard).

The net result is that there are two competing mechanisms for something nobody has asked us for (metadata inspection in SIA2), and hence I'd rather not make a call what kind of API we ought to offer for table metadata discovery outside of TAP in general (where we already have it). The one thing I'm sure of is that we'll not ask our users to pass in MAXREC=0 to a normal search call and then do something with the resulting VOTable themselves.

But what should we do when people pass in MAXREC=0? I mean, this could have been just a mistake when someone, for instance, wrote

.search(..., MAXREC=recs_yet_to_pull)

But really: I think returning a normal, empty result set is about the least spurprising behaviour in that case, and so that is what we should be doing.

bsipocz commented 9 months ago

I think returning a normal, empty result set is about the least spurprising behaviour in that case, and so that is what we should be doing.

Yes, I don't mind the empty table as much as not having access to the description (well, I mind it a bit as the standard says it's a special case yet we handle it the very same as any other maxrec). So at minimum, I would still like access to the raw metadata that goes beyond some column metadata (that we kind of parse into the table columns already), even if it only means of printing out the raw votable without parsing it into a table object.

bsipocz commented 9 months ago

.search(..., MAXREC=recs_yet_to_pull)

What do you mean by this, why would recs_yet_to_pull be a 0 integer?

msdemlei commented 9 months ago

On Fri, Feb 16, 2024 at 08:46:41AM -0800, Brigitta Sipőcz wrote:

.search(..., MAXREC=recs_yet_to_pull)

What do you mean by this, why would recs_yet_to_pull be a 0 integer?

That was an attempt to come up with a case where people would pass in MAXREC=0 -- perhaps someone computes maxrec in some way, and that computation might yield 0 in some corner case. Defensive programming on our size might then suggest that the behaviour is continuous (in some sense adapted to the discrete nature of maxrec) with the maxrec!=0 case.

Or it might not, because arguably that might actually hide bugs in the application code.

I'll not make a call on my preference until I have better idea where that zero would come from when. But I'm very sure that we shouldn't design an API where maxrec=0 is a valid parameter that then takes a code path different from maxrec!=0; it either should raise an error or return an empty result.

bsipocz commented 9 months ago

and that computation might yield 0 in some corner case.

I'm not super concerned for this, as we can always point back to the standard that clearly states that 0 is a special case, and so far we don't have full coverage for problems between keyboard and chair.

andamian commented 9 months ago

If the functionality is similar to VOSITables can we implement the API of VOSITables? This way, we won't have problems with the API if a subsequent version of the spec swings on that direction. Just a thought.