MassBank / MassBank-data

Official repository of open data MassBank records
77 stars 60 forks source link

MassBank Accession / InChIKey CC0 dump for Wikidata #57

Closed schymane closed 5 years ago

schymane commented 5 years ago

@meier-rene can we get a CC0 dump of MassBank Accession IDs with InChIKey mappings for @egonw to add to WikiData? He's registering this property now ;-)

egonw commented 5 years ago

Property has been proposed: https://www.wikidata.org/wiki/Wikidata:Property_proposal/MassBank_Accession_ID

meier-rene commented 5 years ago

Accession_to_InChi-Key.txt

This File contains a mapping of all Accessions with a Creative Commons License to InChi-Key. It was generated with the following bash script:

#!/bin/bash
for x in *; do
        if [ "$x" = "figure" ]; then
                continue
        fi
        if [ -d "$x" ]; then
                cd $x
                        grep -R  INCHIKEY * | awk -F: '{print $1 $3}' | awk  '{print $1" "$3}' | sed 's|.txt||g'
                cd ..
        fi
done

It will most likely contain more Accessions compared to the currently online ones, because I have accepted some new records this morning.

schymane commented 5 years ago

Thanks for the dump! Re Literature specs, I created those myself by manually extracting literature data, @egonw just clarified that we are thus able to put a CC0 license on them as it was my work. Do you want to update these records with CC0 (or CC-BY for consistency?) so that we can include them as well? Likely quicker your end than mine ... ;-)

egonw commented 5 years ago

Since it's only identifiers, some even argue there is no data, and it cannot even be copyrighted.

But a quick note that I am free to enter the content of the text into Wikidata will do fine.

meier-rene commented 5 years ago

Re Literature specs, I created those myself by manually extracting literature data, @egonw just clarified that we are thus able to put a CC0 license on them as it was my work. Do you want to update these records with CC0 (or CC-BY for consistency?) so that we can include them as well? Likely quicker your end than mine ... ;-)

Done.

Regarding the dump: Maybe I misunderstood the request. I thought a dump of all accessions having a Creative Commons license is needed. But as I understand it now the dump itself should be under CC0. This is hereby given. Regarding the licenses in the repository: After the changes from today, there are only CC licenses left in the MassBank-data repo, but they are not CC0 in most cases.

Here is an updated version of the mapping released under CC0: Accession_to_InChi-Key.txt

egonw commented 5 years ago

I'm on it... here's the script that I will adapt from to make Wikidata QuickStatements: https://github.com/egonw/ons-wikidata/blob/master/ExtIdentifiers/comptox.groovy

egonw commented 5 years ago

OK, a good amount added. I think this one can be closed now.