astropy / astroquery

Functions and classes to access online data resources. Maintainers: @keflavich and @bsipocz and @ceb8
http://astroquery.readthedocs.org/en/latest/
BSD 3-Clause "New" or "Revised" License
695 stars 396 forks source link

RemoteServiceError error using mast.Observations.get_product_list() #2852

Closed aggle closed 10 months ago

aggle commented 11 months ago

I'm currently getting a RemoteServiceError error when I try to use mast.Observations.get_product_list(). I'm using astroquery version 0.4.7.dev8893.

The code to reproduce it is:

# Looking for observations of GJ 758
observations = mast.Observations.query_object("GJ 758")

# select 2 obs_ids
obs_ids = observations[observations["project"] == 'JWST']['obs_id'][:2]

# get the associated products from MAST
mast.Observations.get_product_list(obs_ids)

and the traceback is

---------------------------------------------------------------------------
RemoteServiceError                        Traceback (most recent call last)
Cell In[33], line 8
      5 obs_ids = observations[observations["project"] == 'JWST']['obs_id'][:2]
      7 # get the associated products from MAST
----> 8 mast.Observations.get_product_list(obs_ids)

File ~/Projects/miniconda3/envs/mast/lib/python3.11/site-packages/astroquery/utils/class_or_instance.py:25, in class_or_instance.__get__.<locals>.f(*args, **kwds)
     23 def f(*args, **kwds):
     24     if obj is not None:
---> 25         return self.fn(obj, *args, **kwds)
     26     else:
     27         return self.fn(cls, *args, **kwds)

File ~/Projects/miniconda3/envs/mast/lib/python3.11/site-packages/astroquery/utils/process_asyncs.py:29, in async_to_sync.<locals>.create_method.<locals>.newmethod(self, *args, **kwargs)
     27 if kwargs.get('get_query_payload') or kwargs.get('field_help'):
     28     return response
---> 29 result = self._parse_result(response, verbose=verbose)
     30 self.table = result
     31 return result

File ~/Projects/miniconda3/envs/mast/lib/python3.11/site-packages/astroquery/mast/observations.py:68, in ObservationsClass._parse_result(self, responses, verbose)
     50 def _parse_result(self, responses, *, verbose=False):  # Used by the async_to_sync decorator functionality
     51     """
     52     Parse the results of a list of `~requests.Response` objects and returns an `~astropy.table.Table` of results.
     53 
   (...)
     65     response : `~astropy.table.Table`
     66     """
---> 68     return self._portal_api_connection._parse_result(responses, verbose)

File ~/Projects/miniconda3/envs/mast/lib/python3.11/site-packages/astroquery/mast/discovery_portal.py:290, in PortalAPI._parse_result(self, responses, verbose)
    288 # check for error message
    289 if result['status'] == "ERROR":
--> 290     raise RemoteServiceError(result.get('msg', "There was an error with your request."))
    292 result_table = _json_to_table(result, col_config)
    293 result_list.append(result_table)

RemoteServiceError: Error converting data type varchar to bigint.

Originally posted by @aggle in https://github.com/astropy/astroquery/issues/1535#issuecomment-1769057096

aggle commented 11 months ago

Evidently this problem has come up before, e.g. here https://github.com/astropy/astroquery/issues/1535

bsipocz commented 11 months ago

cc @jaymedina

bsipocz commented 11 months ago

Having the same problem popping up again shows that we need much better testing for the mast module.

(unrelated, but I also see 11 test failures, some from a rather long time now (https://github.com/astropy/astroquery/issues/2801), would be nice to get that number down to zero)

jaymedina commented 11 months ago

Hi @aggle! So for this case, the issue is that the obs_ids metadata is not supposed to be fed into get_product_list; it's supposed to be the obsid metadata, or the Product Group ID. See this page for the difference between obs_id and obsid. The gist is that: obs_id is mission-specific and obsid is MAST-specific, and is just an arbitrary identifier used in the backend for querying purposes. If you feed a set of obsids instead, you should get this output:

In

# Looking for observations of GJ 758
observations = mast.Observations.query_object("GJ 758")

# select 2 obs_ids
obs_ids = observations[observations["project"] == 'JWST']['obsid'][:2]

# get the associated products from MAST
mast.Observations.get_product_list(obs_ids)

Out

<Table masked=True length=73>
  obsID   obs_collection dataproduct_type               obs_id               ...   size  parent_obsid dataRights calib_level
   str9        str4            str5                     str34                ...  int64      str9        str6       int64   
--------- -------------- ---------------- ---------------------------------- ... ------- ------------ ---------- -----------
152009184           JWST            image jw01413001001_02101_00001_mirimage ...  406080    152009184     PUBLIC           1
152009184           JWST            image jw01413001001_02101_00001_mirimage ...  420480    152009184     PUBLIC           1
152009184           JWST            image jw01413001001_02101_00001_mirimage ...  406080    152009184     PUBLIC           1
152009184           JWST            image jw01413001001_02101_00001_mirimage ...  420480    152009184     PUBLIC           1
152009184           JWST            image jw01413001001_02101_00001_mirimage ...  406080    152009184     PUBLIC           1
152009184           JWST            image jw01413001001_02101_00001_mirimage ...  420480    152009184     PUBLIC           1
152009184           JWST            image jw01413001001_02101_00001_mirimage ... 1284480    152009184     PUBLIC           2
152009184           JWST            image jw01413001001_02101_00001_mirimage ... 1284480    152009184     PUBLIC           2
152009184           JWST            image jw01413001001_02101_00001_mirimage ... 1284480    152009184     PUBLIC           2
152009184           JWST            image jw01413001001_02101_00001_mirimage ...   34560    152009184     PUBLIC           1
      ...            ...              ...                                ... ...     ...          ...        ...         ...

Let me know if this helps and is what you expected!

This would not have been caught in the unit tests because get_product_list isn't supposed to be utilized with obs_id. But this brings to my attention that the obsid and obs_id criteria are unnecessarily confusing, and either the documentation needs to be updated to emphasize their differences, or we should get rid of obsid as a criteria to search with altogether. I'll bring this up during our standup tomorrow, as one of our members is heavily involved with the backend database that uses obsid, and we'll work on getting this sorted out @bsipocz . The other broken unit tests are on my radar as well.

Thanks @aggle for bringing this to our attention and for using astroquery.mast!

aggle commented 11 months ago

@jaymedina that answers everything. It's a little confusing, as you say, but I can work with it for now. Thanks very much for your help!

aggle commented 11 months ago

Oops, I pressed "close" by accident :-/

aggle commented 11 months ago

Apologies, my comment was incomplete. This does solve my immediate problem of "how to use get_product_list", but it doesn't help with the larger problem I'm trying to solve, which is: given a set of JWST observations, for which I have the obs_id but not the obsid, how can I use astroquery to find the related products?

In my particular case, I am looking for background observations that are linked to particular science observations. I think that is outside the scope of this ticket, though.

bsipocz commented 11 months ago

If things all work as expected, I would still suggest to document this behaviour with an example in the docs, or at least leave a note about the differences of obs_id and obsid. Maybe even try to catch this particular exception and reraise a more descriptive message in the error?

jaymedina commented 11 months ago

given a set of JWST observations, for which I have the obs_id but not the obsid, how can I use astroquery to find the related products?

You can use mast.Observations.query_criteria() to search with obs_id. To demonstrate, I'll use one of the obs_ids you produced from your example. Since obs_id is an Observations criteria, you can feed it as a keyword argument in a query_criteria call, like so:

from astroquery import mast

# Given a known `obs_id`, retrieve my observation
obs = mast.Observations.query_criteria(obs_id="jw01413001001_02101_00001_mirimage")

# Get products from observation
prods = mast.Observations.get_product_list(obs)

The output Table assigned to prods will look something like this:

<Table masked=True length=45>
  obsID   obs_collection dataproduct_type               obs_id               ...   size  parent_obsid dataRights calib_level
   str9        str4            str5                     str34                ...  int64      str9        str6       int64   
--------- -------------- ---------------- ---------------------------------- ... ------- ------------ ---------- -----------
152009184           JWST            image jw01413001001_02101_00001_mirimage ...  406080    152009184     PUBLIC           1
152009184           JWST            image jw01413001001_02101_00001_mirimage ...  420480    152009184     PUBLIC           1
152009184           JWST            image jw01413001001_02101_00001_mirimage ...  406080    152009184     PUBLIC           1
152009184           JWST            image jw01413001001_02101_00001_mirimage ...  420480    152009184     PUBLIC           1
152009184           JWST            image jw01413001001_02101_00001_mirimage ...  406080    152009184     PUBLIC           1
152009184           JWST            image jw01413001001_02101_00001_mirimage ...  420480    152009184     PUBLIC           1
152009184           JWST            image jw01413001001_02101_00001_mirimage ... 1284480    152009184     PUBLIC           2
152009184           JWST            image jw01413001001_02101_00001_mirimage ... 1284480    152009184     PUBLIC           2
152009184           JWST            image jw01413001001_02101_00001_mirimage ... 1284480    152009184     PUBLIC           2
152009184           JWST            image jw01413001001_02101_00001_mirimage ...   34560    152009184     PUBLIC           1
      ...            ...              ...                                ... ...     ...          ...        ...         ...

query_criteria is very useful and versatile, because it also allows for cone searches. If you'd like to see the list of criteria you can work with, in addition to obs_id, you can run the following command, and it'll give you back a table:

mast.Observations.get_metadata("Observations")

Where the values under Column Name are the criteria you can work with. For more examples on the utility of astroquery.mast, feel free to visit the astroquery section of the mast_notebooks repo: https://github.com/spacetelescope/mast_notebooks/tree/main/notebooks/astroquery

jaymedina commented 11 months ago

If things all work as expected, I would still suggest to document this behaviour with an example in the docs, or at least leave a note about the differences of obs_id and obsid. Maybe even try to cat this particular exception and reraise a more descriptive message in the error?

Sounds good - if it turns out we need to keep the obsid criteria for backend purposes, I'll start a new PR with a few of these changes whipped up.

aggle commented 11 months ago

Thanks everybody for your links and suggestions! This is really helpful.

bsipocz commented 11 months ago

The https://masttest.stsci.edu/api/v0/_c_a_o_mfields.html link is already in the docstring, so adding a really short sentence along the line of e.g. "Please note that obsid MAST specific and is not the same as the mission-specific obs-id".