Closed sneumann closed 3 years ago
Hi @sneumann and massbank-data contributors,
Happy to help where I can for this.
If you are interested in the database schema for the library see here (If desired, I can update to add new columns, change names, etc)
thanks!
On Thu, Nov 29, 2018, 6:56 AM Thomas N Lawson <notifications@github.com wrote:
Hi @sneumann https://github.com/sneumann and massbank-data contributors,
Happy to help where I can for this.
If you are interested in the database schema for the library see here https://bioconductor.org/packages/devel/bioc/vignettes/msPurity/inst/doc/msPurity-spectral-matching-vignette.html#4_library_database_schema (If desired, I can update to add new columns, change names, etc)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MassBank/MassBank-data/issues/32#issuecomment-442862642, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA_7LFwRjlXbh1bFBoYsYjDeGlAezzfks5uz_WBgaJpZM4Y5uN3 .
And here is the start of a one-liner to convert, so far without -v
volumes and stuff:
docker run --rm -it ubuntu:18.04 sh -c 'apt update ; apt install -y python-pip git ; git clone https://github.com/MassBank/MassBank-data; pip install msp2db; msp2db -msp_pth MassBank-data -name MassBank -source massbank -o /tmp'
Just updated the one-liner to ensure the correct msp regular expression are used with msp2db (-schema massbank)
docker run --rm -it ubuntu:18.04 sh -c 'apt update ; apt install -y python-pip git ; git clone https://github.com/MassBank/MassBank-data; pip install msp2db; msp2db -msp_pth MassBank-data -name MassBank -source massbank -schema massbank -o /tmp'
Hi all,
I have added the SQLite database of MassBank to the assets of the github release of msp2db. See file massbank_12122018.db
Also, updated the command line to calls to be a bit cleaner
docker run --rm -it ubuntu:18.04 sh -c 'apt update ; apt install -y python-pip git ; git clone https://github.com/MassBank/MassBank-data; pip install msp2db; msp2db --msp_pth MassBank-data --source massbank --schema massbank --out_pth /tmp/massbank.db'
The release also includes a SQLite database of representation of the MoNA MSP files
I can continue maintaining the databases on the msp2db github for now but happy to change if we find a better location to store the database files.
Hi, This is very appreciated! Thanks a lot. In future, we plan to store and release derived DBs in different formats (NIST, SQLite, etc.) at MassBank-data, But at the moment it is great deal to keep it in your repository. Thanks!
@meier-rene, @sneumann and @schymane, we may add some external links for DB download in the Readme?
Hi, I just checked https://github.com/computational-metabolomics/msp2db/releases/ where the SQLite converted MassBank data is included. We need to decide whether we ping msp2db about every release in https://github.com/MassBank/MassBank-data/releases so they can release updated snapshots. Yours, Steffen
Hi @jorainer , in this issue are a few pointers for the sqlite that is in MSPurity. Would there be a chance that your developments on sqlite cover the uses cases implemented in Birmingham ? Or even recycle parts of that ? Yours, Steffen
That's a good point @sneumann ! I'll have a look at the MsPurity database layout (maybe you could point me to the info @Tomnl ?). In general, the CompDb
database layout is super-simple. I just have tables compound, msms_spectrum and msms_spectrum_peak with only very little constraints to accommodate data from all the various sources.
Hi both,
Are you planning on creating a standard MS/MS format for library spectra in SQL?
I think the database for msPurity and msp2db follow a similar structure to CompDb. i.e. three main tables consisting of a compound table, a table for the spectrum peaks (e.g. mz, intensity, etc) and a table for more the spectrum as whole (e.g. precursor mz, fragmentation level, energy etc).
For spectral matching I originally made a different schema for "library" and "query" database. But they are essentially the same basic structure and can be used interchangeably in msPurity. They just have slightly different table names and additional fields for the query spectra. See the "library" database schema code and the more extensive "query" database schema (that includes XCMS mapping as well).
In hindsight I probably should have followed the schema already developed in mzdb... but perhaps that schema was too complex for what was needed
Thank Thomas @Tomnl for you quick reply!
Actually, I'm not trying to define a standard format - I think that might be too complicated, we will still need different layouts for different purposes. I think it's better if we still allow to have different database layouts, but then maybe a shared interface to them. Here's where the Spectra package comes into play. That package provides basic MS spectra processing and handling functionality, but, more importantly, allows to use different backends to represent or provide the data. For the user it does thus not matter from where the data comes (see also here for a short tutorial illustrating that). I'm currently implementing e.g. a backend for MassBank that one could directly access spectra data MassBank.
Btw - maybe you would like to contribute some of the functionality from msPurity
to Spectra
? Or add some functionality you found useful in the processing of MS2 data?
The CompDb
layout should just facilitate sharing of e.g. public spectra (and compound) databases via e.g. Bioconductor's AnnotationHub
(where already genome and genetic annotations are shared).
I think, we solved that with the SQL export.
Hi, @Tomnl has updated his code to convert MassBank records to a sqlite database:
It would be great to distribute snapshots of MassBank-data in such a format. Yours, Steffen