MillionConcepts / pdr

[P]lanetary [D]ata [R]eader - A single function to read all Planetary Data System (PDS) data into Python
Other
60 stars 7 forks source link

pdr.Data missing __contains__ method #57

Closed zackw closed 1 month ago

zackw commented 4 months ago

pdr.Data objects act like dicts to some extent: they have a keys() method [albeit not values() or items()] that returns a list of the data objects they contain, and they support using [] to access any of those data objects by name. Logically, therefore, you ought to be able to use 'name' in data_object as well, but this throws a strange exception:

>>> import pdr
>>> blob = pdr.read('phxao_1001/DATA/PHX_TAU991_090_20090126A.LBL')
>>> sorted(blob.keys())
['HEADER', 'LABEL', 'TABLE']
>>> blob['HEADER'][:64]
'Phoenix opacity measurements for SSI 991-nm solar filter images.'
>>> 'HEADER' in blob
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../pandas/core/generic.py", line 1577, in __nonzero__
    raise ValueError(
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The strange exception is because, when object b does not implement a __contains__ method, a in b falls back to something not entirely unlike

for x in b:
   if x == a:
       return True
return False

and iteration over a pdr.Data object iterates over the values of the contained data objects, not their names, so in the above example it tries to compare the string 'HEADER' to the DataFrame generated by loading the TABLE object.

>>> 'HEADER' == blob['TABLE']
     SSI_PRODUCT_ID  SOLAR_LONGITUDE  ...  ATMOSPHERIC_OPACITY  OPACITY_ERROR
0             False            False  ...                False          False
...
[333 rows x 8 columns]
>>> bool('HEADER' == blob['TABLE'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../pandas/core/generic.py", line 1577, in __nonzero__
    raise ValueError(
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I imagine we are stuck with that for backward compatility's sake, but we can still implement __contains__ to make 'name' in data_object work as expected.

cmillion commented 4 months ago

This seems correct to me. Unless @m-stclair or @Sierra-MC see issues, let's do it.

m-stclair commented 4 months ago

I need to think about it more and discuss. The existence of the keys() and __getitem__() methods are not intended to imply that Data is a Mapping or even a Collection, and it is not intuitive to me that Data.__contains__() should produce this result.

m-stclair commented 4 months ago

I'm even questioning why Data.__iter__() exists, honestly, although it's probably too late to remove it.

m-stclair commented 4 months ago

after discussion we have decided:

  1. Data will be given a __contains__ method aliasing data.keys().__contains__
  2. Data.__iter__() will be deprecated after the introduction of a DeprecationWarning and some months. keeping issue open until these are released.
Sierra-MC commented 1 month ago

Closed with release 1.2.0