I've posted this already in the main repo, but seeing #26 here makes me think this might be the more adequate place to request this.
When downloading datasets, one must download the whole set (or a delta) including all sentences and recordings, whether validated or not, even if the user only needs the validated data. This consumes a lot of bandwidth, time and disk space, and it is not environmentally friendly either.
Offering the option to just download the part of the dataset with validated recordings would save a lot of time and make the data more accessible to more people. Being able to download only the tsv files would also be a good addition, but this is already addressed in #26.
I don't know how complex it would be to implement this, but I feel this would be a very useful quality of life feature, so I hope it is taken into consideration.
Thanks for your work in this amazing project in any case!
I've posted this already in the main repo, but seeing #26 here makes me think this might be the more adequate place to request this.
When downloading datasets, one must download the whole set (or a delta) including all sentences and recordings, whether validated or not, even if the user only needs the validated data. This consumes a lot of bandwidth, time and disk space, and it is not environmentally friendly either.
Offering the option to just download the part of the dataset with validated recordings would save a lot of time and make the data more accessible to more people. Being able to download only the tsv files would also be a good addition, but this is already addressed in #26.
I don't know how complex it would be to implement this, but I feel this would be a very useful quality of life feature, so I hope it is taken into consideration.
Thanks for your work in this amazing project in any case!