audeering / audtorch

Utils and data sets for audio and PyTorch
https://audeering.github.io/audtorch/
Other
83 stars 9 forks source link

Enhance basic data sets #61

Closed hagenw closed 4 years ago

hagenw commented 4 years ago

Summary

Improvement of base data classes.

Proposed Changes

  1. Make root argument optional as it is totally valid to use absolute paths in file lists. This also makes the overall interface slightly easier to implement.
  2. Add duration and offset attribute to AudioDataset, which means all others will have it as well.
  3. Add column_start and column_end to PandasDataset and CsvDataset, which will be translated into the duration and offset attributes.
  4. Allow for column_labels=None, which will return '' as target
  5. All basic data sets in base.py are now keyword only to not break the API later on.

Discussion

What I have not changed, but we discussed it before is:

  1. The default value of column_filename is still filename.
ATriantafyllopoulos commented 4 years ago

@hagenw I leave it up to you to merge this when ready.

I think we could still switch to a default file value in the column_filename for PandasDataset since it must be mostly us that are using the package anyway, but I also don't like getting into the habit of introducing breaking changes for no big reasons. So it's fine for me either way.