caltechlibrary / irdm_harvester

Automatically harvest publications for an InvenioRDM repository
Other
1 stars 0 forks source link

Improve license mapping #1

Open tmorrell opened 1 year ago

tmorrell commented 1 year ago

CrossRef sometimes includes a text and data mining license indicated by "tdm" in the content-version field. We don't want this transferred to InvenioRDM.

rsdoiel commented 1 year ago

Should this go in fix_up? or in the go code?

tmorrell commented 1 year ago

Yeah, we could put this in a python script since it's more of a Caltech thing. I'll transfer it over to the irdm_harvester repo, since that's where i'm going to put the doi python stuff.

tmorrell commented 1 year ago

After some testing, it makes more sense for doi2rdm to map the license type, and then for the cleanup script to make the decision. https://github.com/caltechlibrary/irdmtools/issues/30 is for the go code, and this issue is for python.

We also need a more robust license identification system, since publishers send non-standard license urls (http:// ; not the legalcode version)

tmorrell commented 1 year ago

I've fixed the vor issue, but still need to improve license url mapping.