EMSL-Computing / CoreMS

CoreMS is a comprehensive mass spectrometry software framework
BSD 2-Clause "Simplified" License
51 stars 25 forks source link
complex-mixture data-analysis dissolved-organic-matter mass-spectrometry metabolomics metabolomics-pipeline molecular-database molecular-formulae-assignment molecular-search natural-organic-matter soil-organic-matter

CoreMS Logo



CoreMS DOI

Table of Contents


CoreMS

CoreMS is a comprehensive mass spectrometry framework for software development and data analysis of small molecules analysis.

Data handling and software development for modern mass spectrometry (MS) is an interdisciplinary endeavor requiring skills in computational science and a deep understanding of MS. To enable scientific software development to keep pace with fast improvements in MS technology, we have developed a Python software framework named CoreMS. The goal of the framework is to provide a fundamental, high-level basis for working with all mass spectrometry data types, allowing custom workflows for data signal processing, annotation, and curation. The data structures were designed with an intuitive, mass spectrometric hierarchical structure, thus allowing organized and easy access to the data and calculations. Moreover, CoreMS supports direct access for almost all vendors’ data formats, allowing for the centralization and automation of all data processing workflows from the raw signal to data annotation and curation.

CoreMS aims to provide


Current Version

2.0.10


Main Developers/Contact


Contributing

As an open source project, CoreMS welcomes contributions of all forms. Before contributing, please see our Dev Guide


Data formats

Data input formats

Data output formats

Data structure types

In progress data structures


Available features

FT-MS Signal Processing, Calibration, and Molecular Formula Search and Assignment

GC-MS Signal Processing, Calibration, and Compound Identification

High Resolution Mass Spectrum Simulations


Installation

pip install corems

By default the molecular formula database will be generated using SQLite

To use Postgresql the easiest way is to build a docker container:

docker-compose up -d

Thermo Raw File Access:

To be able to open thermo file a installation of pythonnet is needed:


Docker stack

Another option to use CoreMS is to run the docker stack that will start the CoreMS containers

Molecular Database and Jupyter Notebook Docker Containers

A docker container containing:

If you don't have docker installed, the easiest way is to install docker for desktop

  1. Start the containers using docker-compose (easiest way):

    On docker-compose-jupyter.yml there is a volume mapping for the tests_data directory with the data provided for testing, to change to your data location:

    • locate the volumes on docker-compose-jupyter.yml:
    volumes:
      - ./tests/tests_data:/home/CoreMS/data
    • change "./tests/tests_data" to your data directory location
    volumes:
      - path_to_your_data_directory:/home/corems/data
    • save the file and then call:
    docker-compose -f docker-compose-jupyter.yml up
  2. Another option is to manually build the containers:

    • Build the corems image:

      docker build -t corems:local .
    • Start the database container:

      docker-compose up -d   
    • Start the Jupyter Notebook:

      docker run --rm -v ./data:/home/CoreMS/data corems:local
    • Open your browser, copy and past the URL address provided in the terminal: http://localhost:8888/?token=<token>.

    • Open the CoreMS-Tutorial.ipynb


Simple Script Example

More examples can be found under the directory examples/scripts, examples/notebooks

from corems.transient.input.brukerSolarix import ReadBrukerSolarix
from corems.molecular_id.search.molecularFormulaSearch import SearchMolecularFormulas
from corems.mass_spectrum.output.export import HighResMassSpecExport
from matplotlib import pyplot

file_path= 'tests/tests_data/ftms/ESI_NEG_SRFA.d'

# Instatiate the Bruker Solarix reader with the filepath
bruker_reader = ReadBrukerSolarix(file_path)

# Use the reader to instatiate a transient object
bruker_transient_obj = bruker_reader.get_transient()

# Calculate the transient duration time
T =  bruker_transient_obj.transient_time

# Use the transient object to instatitate a mass spectrum object
mass_spectrum_obj = bruker_transient_obj.get_mass_spectrum(plot_result=False, auto_process=True)

# The following SearchMolecularFormulas function does the following
# - searches monoisotopic molecular formulas for all mass spectral peaks
# - calculates fine isotopic structure based on monoisotopic molecular formulas found and current dynamic range
# - searches molecular formulas of correspondent calculated isotopologues
# - settings are stored at SearchConfig.json and can be changed directly on the file or inside the framework class

SearchMolecularFormulas(mass_spectrum_obj, first_hit=False).run_worker_mass_spectrum()

# Iterate over mass spectral peaks objs within the mass_spectrum_obj
for mspeak in mass_spectrum_obj.sort_by_abundance():

    # If there is at least one molecular formula associated, mspeak returns True
    if  mspeak:

        # Get the molecular formula with the highest mass accuracy
        molecular_formula = mspeak.molecular_formula_lowest_error

        # Plot mz and peak height
        pyplot.plot(mspeak.mz_exp, mspeak.abundance, 'o', c='g')

        # Iterate over all molecular formulas associated with the ms peaks obj
        for molecular_formula in mspeak:

            # Check if the molecular formula is a isotopologue
            if molecular_formula.is_isotopologue:

                # Access the molecular formula text representation and print
                print (molecular_formula.string)

                # Get 13C atoms count
                print (molecular_formula['13C'])
    else:
        # Get mz and peak height
        print(mspeak.mz_exp,mspeak.abundance)

# Save data
## to a csv file
mass_spectrum_obj.to_csv("filename")
mass_spectrum_obj.to_hdf("filename")
# to pandas Datarame pickle
mass_spectrum_obj.to_pandas("filename")

# Extract data as a pandas Dataframe
df = mass_spectrum_obj.to_dataframe()

UML Diagrams

UML (unified modeling language) diagrams for Direct Infusion FT-MS and GC-MS classes can be found here.


Citing CoreMS

If you use CoreMS in your work, please use the following citation:

Version 2.0.10 Release on GitHub, archived on Zenodo:

DOI

Yuri E. Corilo, William R. Kew, Lee Ann McCue (2021, March 27). EMSL-Computing/CoreMS: CoreMS 2.0.1 (Version v2.0.1), as developed on Github. Zenodo. http://doi.org/10.5281/zenodo.4641552



***

This material was prepared as an account of work sponsored by an agency of the
United States Government.  Neither the United States Government nor the United
States Department of Energy, nor Battelle, nor any of their employees, nor any
jurisdiction or organization that has cooperated in the development of these
materials, makes any warranty, express or implied, or assumes any legal
liability or responsibility for the accuracy, completeness, or usefulness or
any information, apparatus, product, software, or process disclosed, or
represents that its use would not infringe privately owned rights.

Reference herein to any specific commercial product, process, or service by
trade name, trademark, manufacturer, or otherwise does not necessarily
constitute or imply its endorsement, recommendation, or favoring by the United
States Government or any agency thereof, or Battelle Memorial Institute. The
views and opinions of authors expressed herein do not necessarily state or
reflect those of the United States Government or any agency thereof.

                 PACIFIC NORTHWEST NATIONAL LABORATORY
                              operated by
                                BATTELLE
                                for the
                   UNITED STATES DEPARTMENT OF ENERGY
                    under Contract DE-AC05-76RL01830