audeering / audformat

Format to store media files and annotations
https://audeering.github.io/audformat/
Other
10 stars 0 forks source link

audformat.Database.get() is not always returning the requested scheme #426

Closed hagenw closed 2 months ago

hagenw commented 2 months ago

For the following example, we need the crema-d dataset:

import audb

db = audb.load("crema-d", version="1.3.0", only_metadata=True, full_path=False)

The dataset has a sex scheme, that we can request as additional scheme:

>>> db.get("emotion", additional_schemes=["speaker", "sex"], tables="emotion.categories.test.gold_standard")
                          emotion speaker     sex
file                                             
1048/1048_IEO_NEU_XX.wav  neutral    1048    male
1048/1048_IEO_HAP_LO.wav  neutral    1048    male
1048/1048_IEO_HAP_MD.wav  neutral    1048    male
1048/1048_IEO_HAP_HI.wav  neutral    1048    male
1048/1048_IEO_SAD_LO.wav  neutral    1048    male
...                           ...     ...     ...
1091/1091_WSI_HAP_XX.wav  neutral    1091  female
1091/1091_WSI_SAD_XX.wav  neutral    1091  female
1091/1091_WSI_ANG_XX.wav    anger    1091  female
1091/1091_WSI_FEA_XX.wav  sadness    1091  female
1091/1091_WSI_DIS_XX.wav  neutral    1091  female

[1392 rows x 3 columns]

But when requesting the same scheme, but another in addition, we get nothing back:

>>> db.get("emotion", additional_schemes=["speaker", "bla", "sex"], tables="emotion.categories.test.gold_standard")
                          emotion speaker  bla  sex
file                                               
1048/1048_IEO_NEU_XX.wav  neutral    1048  NaN  NaN
1048/1048_IEO_HAP_LO.wav  neutral    1048  NaN  NaN
1048/1048_IEO_HAP_MD.wav  neutral    1048  NaN  NaN
1048/1048_IEO_HAP_HI.wav  neutral    1048  NaN  NaN
1048/1048_IEO_SAD_LO.wav  neutral    1048  NaN  NaN
...                           ...     ...  ...  ...
1091/1091_WSI_HAP_XX.wav  neutral    1091  NaN  NaN
1091/1091_WSI_SAD_XX.wav  neutral    1091  NaN  NaN
1091/1091_WSI_ANG_XX.wav    anger    1091  NaN  NaN
1091/1091_WSI_FEA_XX.wav  sadness    1091  NaN  NaN
1091/1091_WSI_DIS_XX.wav  neutral    1091  NaN  NaN

[1392 rows x 4 columns]
hagenw commented 2 months ago

If we request first the existing schemes, it works as expected:

>>> db.get("emotion", additional_schemes=["speaker", "sex", "bla"], tables="emotion.categories.test.gold_standard")
                          emotion speaker     sex  bla
file                                                  
1048/1048_IEO_NEU_XX.wav  neutral    1048    male  NaN
1048/1048_IEO_HAP_LO.wav  neutral    1048    male  NaN
1048/1048_IEO_HAP_MD.wav  neutral    1048    male  NaN
1048/1048_IEO_HAP_HI.wav  neutral    1048    male  NaN
1048/1048_IEO_SAD_LO.wav  neutral    1048    male  NaN
...                           ...     ...     ...  ...
1091/1091_WSI_HAP_XX.wav  neutral    1091  female  NaN
1091/1091_WSI_SAD_XX.wav  neutral    1091  female  NaN
1091/1091_WSI_ANG_XX.wav    anger    1091  female  NaN
1091/1091_WSI_FEA_XX.wav  sadness    1091  female  NaN
1091/1091_WSI_DIS_XX.wav  neutral    1091  female  NaN

[1392 rows x 4 columns]

But from the point where we request a non-existing scheme, everything else is also set to NaN:

>>> db.get("emotion", additional_schemes=["bla", "speaker", "sex"], tables="emotion.categories.test.gold_standard")
                          emotion  bla speaker  sex
file                                               
1048/1048_IEO_NEU_XX.wav  neutral  NaN     NaN  NaN
1048/1048_IEO_HAP_LO.wav  neutral  NaN     NaN  NaN
1048/1048_IEO_HAP_MD.wav  neutral  NaN     NaN  NaN
1048/1048_IEO_HAP_HI.wav  neutral  NaN     NaN  NaN
1048/1048_IEO_SAD_LO.wav  neutral  NaN     NaN  NaN
...                           ...  ...     ...  ...
1091/1091_WSI_HAP_XX.wav  neutral  NaN     NaN  NaN
1091/1091_WSI_SAD_XX.wav  neutral  NaN     NaN  NaN
1091/1091_WSI_ANG_XX.wav    anger  NaN     NaN  NaN
1091/1091_WSI_FEA_XX.wav  sadness  NaN     NaN  NaN
1091/1091_WSI_DIS_XX.wav  neutral  NaN     NaN  NaN

[1392 rows x 4 columns]