Open jorainer opened 2 years ago
Dear @jorainer, I was wondering if where is the massbank sql data from? Is it the compilation of all spectra as provided on the massbank web page? I am asking this for the sake of confidence in my target identification.
Hey @YangjjMSresearch, you can find the massbank.sql at our GitHub site. It is available beginning with database version 2020.11. We usually announce new releases by Twitter or the MassBank Europe website or you can watch the MassBank data repository.
Best wishes, Tobias
@tsufz Noted with many thanks! I noticed that there are around 80000 spectra founded from the massbank.sql database. From the MoNA website, I see there are 196,159 spectra inside. Is this massbank.sql repository different from the MoNA repository?
Best regards, Junjie
@YangjjMSresearch. I reviewed MassBank of America, They actually provide 73 k MassBank Europe records and in total 175 k GC -and LC-records. MoNA and MassBank hold different datasets. MassBank Europe records are only a part of it among GNPS, HMDB, ReSPECT and others. Thus, the contents of MassBank Europe and MoNA are different. I am also not sure about the update frequency of MoNA.
The structure of the sql files are also different. We provide the dump of our internal database. MoNA provides a dump of their database. @jorainer may explain, it if the MoNA sql files are also usuable.
If you want to use MassBank Europe records only, use our databases, please. They guarantee reproducibility, as we provide versioned releases. In addition, you may use MoNA sql files containing different other libraries as appropriate.
Best wishes, Tobias
I've never looked into MoNa sql files - I was even not aware that they provide their data as SQL. Maybe I have a look into that someday
Hi all, I still think it would be great to serve the SQLite that can be used with RforMassSpectrometry directly, otherwise the users have to do the conversion themselves...
Agree @meowcat ! note that I have one pre-build database here: https://github.com/jorainer/SpectraTutorials/releases/tag/2021.03 . I've also included a super-simple short function to MsBackendMassbank
that extracts the databases tables from the MySQL MassBank and stores it into an SQLite database: https://github.com/rformassspectrometry/MsBackendMassbank/blob/master/inst/scripts/massbank-to-sqlite.R
My other plan is (if I finally find the time) to create such MassBank SQLite versions (maybe as CompDb
databases, because they use the same database layout) for each new MassBank and distribute them via Bioconductor's AnnotationHub
. That would make it super-simple for users to search for and install any MassBank release.
Hi, I made a small converter that takes MassBank dumps and converts to SQLite as well as mzVault formats. It's a docker and has no external requirements.
https://github.com/meowcat/MassBank-convert
Note that the mzVault converter collapses compound information by InChIKey. A further function which isn't working well collapses by 1D inchikey, but is deactivated in the config.
Can we add that to some CI that generates "best-effort community-contributed conversions"?
+1 on "best-effort community-contributed conversions", the OpenMS Team has an mzML converter mentioned in https://github.com/MassBank/MassBank-data/issues/31. Yours, Steffen
For some users a MySQL database might be a little to demanding (see also https://github.com/sneumann/xcms/issues/534). Would be nice to provide the MassBank data in addition as an SQLite database. This would be ~ straight forward:
Convert the MySQL dump to SQLite SQL calls using mysql2sqlite
The only problem is that the
views
are not correctly converted (actually, seems mysql2sqlite ignores them completely). To fix that one could simply insert the views later using:Where the views.sql file is simply a file containing the
create view...
calls from the 01-init-massbank.sql script.With this SQLite database it would be super-easy to use the MassBank data in R:
so, one would have full access to MassBank (cc @tsufz ).