audeering / audinterface

Generic interfaces for signal processing
https://audeering.github.io/audinterface/
Other
4 stars 0 forks source link

Add support for reading text files as media files #173

Open hagenw opened 2 months ago

hagenw commented 2 months ago

In audb 1.7.0 we added support to publish not only audio and video files, but every file format a user would like to publish. This means we should also adjust process_index(), process_file(), process_files(), process_folder() to support other files.

The question is how to best support text files:

And how to return the content of a text file:

/cc @maxschmitt

maxschmitt commented 2 months ago

I would go for this one:

Should we pre-define a list of file extension(s), that are then treated as text files?

It would be also great if we can support structured text as in a json file. This is especially useful for dialog datasets, with metadata on turn level.

And how to return the content of a text file: Should it be a text string? Should it be a JSON string?

If it is a .txt file, it should be a text string, if it is a .json file, it should be a json string. This would imho be the both simplest and clearest solution.

hagenw commented 2 months ago

Thanks for the feedback, sounds indeed like a good solution.