astropy / astroquery

Functions and classes to access online data resources. Maintainers: @keflavich and @bsipocz and @ceb8
http://astroquery.readthedocs.org/en/latest/
BSD 3-Clause "New" or "Revised" License
706 stars 398 forks source link

ALMA query returning minimally different duplicate entries for programs #2156

Closed privong closed 1 month ago

privong commented 3 years ago

I am not sure if this is a bug or if I am misunderstanding something about what astroquery is retrieving/returnhing from the ALMA archive results. In essence, for queries done with astroquery.alma (either on a target name or a position) I am receiving 4x as many rows as the ALMA archive web interface returns.

For example, using the web interface for target 'NGC 7552' returns 3 observations (member ous IDs of: uid://A001/X1320/X62, uid://A001/X133d/X18ca, and uid://A001/X133d/X1871).

However an astroquery search returns more rows (12) and when considering a single observation ID / member ous ID, the only differences in the rows are between in these columns: em_min, em_max, em_res_power, and sensitivity_10kms:

In [1]: from astroquery.alma import Alma

In [2]: res = Alma.query_object('NGC 7552')

In [3]: len(res)
Out[3]: 12

In [4]: p1 = 'uid://A001/X133d/X1871'

In [5]: limobs = res[res['obs_id']==p1]

In [6]: for col in limobs.columns:
   ...:     entries = list(set(limobs[col]))
   ...:     if len(entries) > 1:
   ...:         print(col, "has unique entries")
   ...:         print(limobs[col])
   ...:
em_min has unique entries
        em_min
          m
---------------------
0.0006131220609530008
0.0006112569467662004
0.0006265035056818631
0.0006284629723281648
em_max has unique entries
        em_max
          m
---------------------
0.0006156203061267376
0.0006137399849825776
0.0006291122217032601
0.0006310880662145999
em_res_power has unique entries
   em_res_power
------------------
15584.239401244233
15631.984986636786
15313.520425563484
15265.774840170934
sensitivity_10kms has unique entries
sensitivity_10kms
    mJy / beam
------------------
 506.9967995761073
192.70515542195972
 99.49826361572201
 207.9495367122564

The ALMA archive query reports the smallest of the sensitivity_10kms values.

Are these effectively the values for the different spectral windows within the observation and this is reporting? And is merely passing along what the ALMA archive reports?

keflavich commented 3 years ago

This seems like an ALMA archive question; I don't think there's any reason astroquery would return something different than what's in the archive, but I find it a little concerning, especially given that this seems to be the same observation reported 4 times.

privong commented 3 years ago

I discussed this a bit with @alipnick and he confirmed that the rows correspond to individual spectral windows. This can be see in the archive result for the member OUS used above. Hovering over the "Frequency Support" result shows the same sensitivity values as I copied above. Or directly following from astroquery example I pasted initially:

In [12]: list(set(limobs['frequency_support']))
Out[12]: ['[475.04..477.02GHz,31250.00kHz,207.9mJy/beam@10km/s,18.5mJy/beam@native, XX YY] U 
[476.53..478.52GHz,31250.00kHz,99.5mJy/beam@10km/s,8.9mJy/beam@native, XX YY] U 
[486.98..488.96GHz,31250.00kHz,507mJy/beam@10km/s,45.7mJy/beam@native, XX YY] U 
[488.47..490.45GHz,31250.00kHz,192.7mJy/beam@10km/s,17.4mJy/beam@native, XX YY]']

It seems the similar, but not duplicate, rows are what is actually in the table being accessed by TAP but that the ALMA web interface to the archive is combining these when presenting the results.

Given that the information that varies among the 4 rows can be reconstructed from the frequency_support information already provided, it might be good for astroquery to return only one row (per unique MOUS?). But that would involve processing the query after its received, and I understand if that's not desirable. But I guess the current state is that one row is being returned per spectral window.

As a further check, I've verified that a query for MOUS uid://A001/X121/X308 returns 9 rows, and the Frequency Support indicates 9 spectral windows.

privong commented 2 years ago

@keflavich Any thoughts on what to do with this? I'm okay closing it as "not a bug" if y'all want to avoid doing any processing of query results before giving them to the user.

keflavich commented 2 years ago

I don't think this is a bug, but it's a totally reasonable feature request / feature to add. We provide some tools for "post-processing" the archive-returned values for other archives (e.g., splatalogue), so we could add such a tool to the utils, for example. However, I'd say it's not a huge priority - just something that would be cool to have.

This issue serves as a useful warning, though, so hopefully users confused by getting duplicate entries will hit here.

andamian commented 1 month ago

I've double-checked with ALMA and the design choice was intentional. The two interfaces to the archive are designed for different types of user/use cases. The Website is accessed by humans and the results are presented in a concise format. astroquery.alma is used by machines for which data repetition is no such big issue (data duplication vs code complexity).

As such, this is not a candidate for a feature and I am going to close it. Thanks for the feedback @privong and sorry it took so long to reach this conclusion.