audeering / audbcards

Data cards for audio datasets
https://audeering.github.io/audbcards/
Other
0 stars 0 forks source link

Support | in dataset description #75

Closed hagenw closed 6 months ago

hagenw commented 6 months ago

Closes #72

This ensures that "|" contained in audbcards.Dataset.description is converted to "\|" when written to a datacard, in order to be not interpreted as an inline substitution. This is done by replacing it inside audbcards/core/templates/datacard_description.j2, to not change the output of audbcards.Dataset.description.

It fixes the bare_db(), minimal_db(), medium_db() fixtures to not load the dataset to the default audb cache, but use the audb_cache fixture we created for that reason. Before I thought it would be automatically used in the fixtures as well, as I set autouse=True, but this was not the case. This was done in #76.

hagenw commented 6 months ago

So audb_cache set the CONFIG vars accordingly but they must be be used in the load as well. Good that this is detected.

Good point, I was wrong here. To fix this issue independent from the actual problem here, I created first: https://github.com/audeering/audbcards/pull/76

hagenw commented 6 months ago

I wonder whether we are confident that this will be general enought? Or differently, might other special characters (e.g. .) show up in other locations than description.

I think there are not many characters that have special meaning in RST. I have tested the current code on all our internal datasets and couldn't find any other problems so far. If you would like to add further tests, please open an issue that proposes to create a test dataset that contains lots of strange characters in the description. And also propose to extend the tests beyond the dataset description.

hagenw commented 6 months ago

I have now incorporated #76, and this is again ready for review.

ChristianGeng commented 6 months ago

I wonder whether we are confident that this will be general enought? Or differently, might other special characters (e.g. .) show up in other locations than description.

I think there are not many characters that have special meaning in RST. I have tested the current code on all our internal datasets and couldn't find any other problems so far. If you would like to add further tests, please open an issue that proposes to create a test dataset that contains lots of strange characters in the description. And also propose to extend the tests beyond the dataset description.

Good. Then I would say this is good to go.