MassBank / MassBank-data

Official repository of open data MassBank records
74 stars 59 forks source link

InChIKey and matching DTXSID dump for MassBank #77

Closed schymane closed 5 years ago

schymane commented 5 years ago

@meier-rene are you able to produce a dump file with all InChIKeys in MassBank and, where they have them, the corresponding DTXSIDs? I need all the InChIKeys for one file, and all the DTXSIDs for another. I've browsed and found several varients of such files, but not one containing exactly this information paired. If you have one already that I missed, please point me to it ;-) Thanks!

meier-rene commented 5 years ago

There was no script available to create exactly the information you requested.

inchikey_comptox_report.txt

For later usage the script to create that report:

#!/bin/bash
for x in *; do
        if [ "$x" = "figure" ]; then
                continue
        fi
        if [ -d "$x" ]; then
                cd $x
                for y in *.txt; do
                        echo `grep INCHIKEY $y` `grep COMPTOX $y`
                done
                cd ..
        fi
done | sed 's/CH\$LINK://g' | sed -r '/^\s*$/d' | sort | uniq
schymane commented 5 years ago

Thanks @meier-rene - DTXSIDs sent to @ChemConnector to update https://comptox.epa.gov/dashboard/chemical_lists/massbankref

and I'll be using the InChIKeys to update NORMAN-SLE shortly https://www.norman-network.com/nds/SLE/

Thanks for the rapid turnover ;-)

schymane commented 5 years ago

@meier-rene the CompTox list is updated with all public DTXSIDs from your dump, here's the list of non-public entries (390 total) that we should remove, see #68. Once this is done I think we can close #66, #68 and this issue.

MassBankEU_DTXSIDs_Level6_notPublic_17062019.txt

meier-rene commented 5 years ago

I removed all mentioned DTXSIDs from MassBank and will recheck all IDs against the updated service. I will close this now. Thank you all!

schymane commented 5 years ago

Thank YOU @meier-rene too! @ChemConnector confirmed that you should no longer be able to retrieve non-public entries via their web services, so this should not happen again, hopefully all is fixed on all sides now!