glasgowcompbio / ms2ldaviz

Substructural discovery in untargeted metabolomics data using LDA topic modelling.
http://ms2lda.org
MIT License
11 stars 8 forks source link

Update registration function #152

Closed joewandy closed 3 years ago

joewandy commented 5 years ago
  1. Add example experiments to a new user (see add_example_experiments.py, and also #111 )
  2. Add message:
We have just created you an account on www.ms2lda.org<http://www.ms2lda.org> where you can explore some previously run data sets and you can also start to submit your own data. Please find the log in button on the right upper side of the web view.
Your username is: xxxxx00x
Your temporary password is: XXXXXXX (please change this asap using the 'Update Profile' tab in the top bar of the website)
[all case-sensitive]
!!Please read the below email which guides you how to perform your first MS2LDA experiment!!
Please note you will find data sets to look at run through standard LDA (described in the paper) but also data sets that were 'decomposed' on characterized Mass2Motifs from MassBank and GNPS, allowing a quick insight in the data if indeed they contain similar substructures as captured by those databases.
As for you own data, please take care to select and submit the correct format and also to fill out the correct filters for RT and mass intensities in MS1 and MS2 [the defaults are suitable only for Thermo Q-Exactive spectral files]. Inclusion of noise does not contribute to the substructure discovery, and will make the LDA process running much slower. Thus, it is very important to check the noise level in your data and modify the minimum MS2 level to include accordingly. For example, ToF-based machines generate spectra with noise levels typically around 100 a.u. - whereas the default is set to 5000 for QExactive spectra.
We would recommend to, if possible, submit a small subset of the data to check if things complete as expected.
Submitted experiments will go 'pending' till they are finished. Depending on the size of the data and if any other experiments are running, it might take from a few hours till some longer to finish – when finished, it will appear with the LDA or Decomposition experiments.
If you upload an MS1 peak list with one or multiple files, then those peaks will be used to match the extracted MS1-MS2 pairs to according to thresholds you can put. Please check the website for requirements of the MS1 peak file. One experiments that you can view contains examples of MS1 comparisons that you then can do to find Mass2Motifs that contain discriminative metabolites between two groups.
Please also note that the MS2 masses are by default binned in 0.005 Da bins, so please be aware that the masses displayed for them are no longer 'accurate' masses. There is an option now to choose for different bin sizes in case that is more appropriate for the data.
The experiments that are in your account now you can view, the ones you will upload yourself you can edit as well – and thus start to annotate your Mass2Motifs from an LDA run. To help you on the way, you can perform Motif-matching to previously run experiments.
You can find all these functionalities once you click on a finished experiment. A good start is the summary page where you can get a clue of how much spectra are in each Mass2Motif so you can set a reasonable threshold for the visualization of the network (minimum degree – if put too high, not many data will be displayed)
One of the tabs is called "View Experiment options" - here you can set the thresholds for a fragmented spectrum (document) to belong to a Mass2Motif. In our experience, a probability threshold of 0.1 and an overlap threshold of 0.3 is a good starting point to explore the data. By default, both are set at 0.05. A final note on this is that the MS2LDA model requires all fragmented spectra to be part of at least one Mass2Motif. Therefore, in some cases, fragmented molecules might have a very high probability but very low overlap with the Mass2Motif - this happens to molecules that have a unique fragmentation spectrum compared to all other spectra in the data set.
In case you have MS1 information of your fragmented peaks as well, we generally advise to try a MS2 file only first to check the appropriate settings (number of motifs, noise filtering, and quality of resulting motifs) before providing a MS1 file to match to the MGF. Then, please take care of the correct format (you can download an example from the ms2lda.org website) and check whether, for example, you provided the correct unity for the RT (either minutes (in your case it seems) or seconds) in the create experiment field. Finally, we advise to use an "identifier" such as SCAN (number), Feature_ID, or ID to base the matching on - you can indicate in the create experiment form which column headings to look for - this usually provides a more streamlined matching.
More information can be found in the user guide: http://ms2lda.org/user_guide/ and the website was recently published in Bioinformatics:https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btx582/4158166/Ms2lda-org-web-based-topic-modelling-for
Let us know if you have any questions regarding how to use the website or any ideas for novel features that you would find useful.
Also, if you find any bugs we would like to hear that as well – so we can improve the overall user experience for everyone.
Recently, NEW features were added: you can now delete test experiments with a button from the ms2ldaviz app, and you can make any ms2lda experiment public by toggling an option on the summary page. Finally, integration of MAGMa with MS2LDA resulted in automatically annotated data sets for reference MS/MS data from GNPS and MassBank (experiments 190 and 191) which you can access through your account. The annotated motif sets can be taken up in your ms2lda experiments through motifdb (ms2lda.org/motifdb). It is now also possible to run MS2LDA from within GNPS after you have run a molecular network and the resulting LDA model can then be uploaded into ms2lda.org - for more information on how to do this and a tutorial, see: https://ccms-ucsd.github.io/GNPSDocumentation/ms2lda/. To see how this could enrich your chemical analysis, please look at our most recent paper in Metabolites: https://www.mdpi.com/2218-1989/9/7/144/htm
Stay tuned for more developments!
joewandy commented 3 years ago

Done