Closed juice500ml closed 1 month ago
hey @juice500ml , thanks a lot for bringing this issue up. My reading of the user agreement is a bit different, 'for other purposes' may mean 'non-research' or commercial purposes, which this is not. Besides, it's been available for a while now via Deep Lake and Hugging Face (here), so not sure if it's an issue.
Are you by any chance with LDC? We could potentially include a user agreement before user accesses the dataset, which is common in such cases.
Dear @mikayelh , thanks a lot for a quick response! To my understanding, this clause becomes a problem for redistributing the data.
Unless explicitly permitted herein, User shall not otherwise publish, retransmit, disclose, display, copy, reproduce or redistribute the LDC Databases to others outside of User’s Research Group.
In this case, I think, everyday user of your project would be well outside of the definition of "User's Research Group", or I might be wrong. Also, I believe what you mentioned about huggingface distribution is this, right? https://huggingface.co/datasets/timit_asr In that case, one has to download the data from LDC manually.
My affiliation (CMU) is part of LDC, but I'm not exactly with LDC, so I won't be able to answer those kind of questions :( Actually, I was looking ways to include LDC within a public project also, and I stumbled upon this project.
Got it, @juice500ml! I'll reach out to the contact listed on their website and we will take down the dataset or include the user agreement if they desire so. Not sure about this specific dataset, but a large part of datasets, including this one, has been included long ago and we filtered out ones that were restrictive, so unless this agreement was implemented later on, that wouldn't be an issue.
In your specific case though seems like you'd be able to use the dataset via Deep Lake without an issue.
Thanks again for letting us know!
I see, hope everything works out! Thanks a lot!
Severity
P1 - Urgent, but non-breaking
Current Behavior
TIMIT dataset is part of the Linguistic Data Consortium (LDC), and the dataset license seems to be governed by the LDC Non-member agreement, which explicitly states that User shall have no right to copy, redistribute, transmit. timit-train and timit-test is possibly breaching the licensing agreement.
Steps to Reproduce
https://datasets.activeloop.ai/docs/ml/datasets/timit-dataset/
Expected/Desired Behavior
Potentially remove TIMIT.
Python Version
No response
OS
No response
IDE
No response
Packages
No response
Additional Context
No response
Possible Solution
No response
Are you willing to submit a PR?