AlibabaResearch / DAMO-ConvAI

DAMO-ConvAI: The official repository which contains the codebase for Alibaba DAMO Conversational AI.
MIT License
1.15k stars 185 forks source link

License of Bird-SQL data #30

Closed duyvuleo closed 1 year ago

duyvuleo commented 1 year ago

Hi authors,

Thanks for the great work.

I have a confusion on the data license.

In the paper (https://arxiv.org/pdf/2305.03111.pdf), you mentioned: "The databases in this study are open-source with appropriate license and should be distributed under the CC BY-SA 4.0: https://creativecommons.org/licenses/by-sa/4.0/".

But in the repo, from README.md (https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/bird), I can see "License Notation: BIRD-SQL is constructed and distributed for academic use instead of commercial use. For non-academic purposes of this data, please contact corresponding authors.".

I am just wondering what the correct license for this Bird-SQL dataset is. Any chance for commercial use of this dataset?

Thanks!

accpatrick commented 1 year ago

Hello, @duyvuleo. We appreciate your interest in our work. Sorry for any confusion. The initial footnote in our paper refers to the databases from the original resources, which were collected under the proper licenses and can be distributed under the CC BY-SA 4.0 license (I will make it clear in following versions of the paper). However, the complete BIRD dataset, including the text, evidence, SQLs, and processed databases that we created, must be distributed under the CC BY-NC 4.0 license. This is to prevent the abuse of large volumes of data, potentially may lead to inappropriate outcomes.

As we've stated in our GitHub repository, if you require the data for commercial use urgently, please feel free to reach out to the authors. When emailing, it's not necessary to dig out the details of your project. We simply need to be reassured that your intended commercial use will be healthy to the community and your users. Sorry for any inconvenience and thanks for your understanding.

accpatrick commented 1 year ago

Thank you for your question, @hexiaoting! To download BIRD, you can visit the BIRD homepage (https://bird-bench.github.io/). There, you'll find options to download the Train Set or the Dev Set. Simply click on the corresponding links to download the dataset quickly and easily.

image

Or you could download them via wget in your shell: wget https://bird-bench.oss-cn-beijing.aliyuncs.com/train.zip and wget https://bird-bench.oss-cn-beijing.aliyuncs.com/dev.zip.

hexiaoting commented 1 year ago

@accpatrick I try to run the finetuning code but failed when preparing the env. what's the version of datasets? when I install transformers=4.9.2, an error occur:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datasets 2.12.0 requires huggingface-hub<1.0.0,>=0.11.0, but you have huggingface-hub 0.0.12 which is incompatible.

can you help me?

huybery commented 1 year ago

@hexiaoting Please open new a issue to introduce this new questions.