NifTK / NiftyNetModelZoo

[unmaintained] This repository hosts NiftyNet networks pre-trained for specific tasks
http://niftynet.io
Apache License 2.0
50 stars 26 forks source link

Check compatibility of license of each entry with the original dataset license #9

Open tvercaut opened 5 years ago

tvercaut commented 5 years ago

As per #1 CC-BY is chosen as the default licence for the model zoo entries. However, this might not be compatible with the licence of the training dataset that was used to compute the weights.

OASIS for example has a permissive CC-BY licence (https://www.oasis-brains.org/#access) but has additional citation requirements which are currently not quite met in https://github.com/NifTK/NiftyNetModelZoo/tree/5-reorganising-with-lfs/OASIS

We need to check each entry individually.

wyli commented 5 years ago

For OASIS there's an additional license file included in the .tar.gz; for BRATS, it's a few volume extracted from the original set, I have contacted Spyros, he agreed that we host these volumes with a citation to the original papers. I'll double check the other downloadables...

tvercaut commented 5 years ago

Thanks. Note that it's not only about the data but also about the pre-trained weights as these might be considered derived work. Not 100% sure about it but would be worth looking into.

Re OASIS, for clarity, we could copy (or point to) the OASIS licence in a README file (in line with the discussion in #6 )

fepegar commented 5 years ago

@tvercaut, do you have any reference that explains what licenses are needed for machine learning models?

tvercaut commented 5 years ago

That is a complex question and in many cases might depend on the licences under which the training data was released. You will need someone with an actual law background to help navigate these questions I am afraid.

Even when the training data consists of photographs from say imagenet, flickr, etc. there are copyright questions. Whether pre-trained weights from there fall under "fair use" (not convinced but see see e.g. https://fairuse.stanford.edu/overview/fair-use/what-is-fair-use/) or whether they fall under "databases/fact compilations" (never really looked into these) or whether I am just fantasising (very plausible but I don't think this has been tested in court yet) is a great question. You will find many reddit and similar discussions on the topic, e.g.:

In short, we won't have a clear cut answer unless the licence in the original dataset helps us out...

fepegar commented 5 years ago

Thanks, Tom! I'll take a look.