COL-IU / msCRUSH

mass spectrum ClusteRing Using locality Sensitive Hashing
GNU General Public License v3.0
8 stars 7 forks source link

msCRUSH

Introduction

msCRUSH (standing for mass spectrum ClusteRing Using locality Sensitive Hashing) was developed by Lei Wang, Sujun Li and Haixu Tang*, for the purpose of clustering large-scale tandem mass (MS/MS) spectra and then generating high quality consensus spectra for clusters of similar MS/MS spectra. Multithreading is enabled in this package. msCRUSH can take as input multiple MGF files (regular expression is supported) of spectra of multiple charge states (including spectra without charge). If the input MS/MS spectra come with multiple charge states, then a clustering result file will be created for each charge state.

Prerequisites

g++ with version 5.1.0+ is required.

Installation

  1. cd to the main directory of msCRUSH.
  2. Type ./install.sh
  3. Two executable files will be placed under bin directory: mscrush_on_general_charge for clustering similar spectra and generate_consensus_spectrum_for_mscrush for generating consensus spectra.

Cluster similar spectra

Generate consensus spectra

Note that writing consensus spectra (cs) to disk in MGF format can be time consuming, so if consensus spectra is needed, run scripts below.

  1. cd bin
  2. Usage: ./generate_consensus_spectrum_for_mscrush -c mscrush_cluster(s) -f mgf_files(s) [-t consensus_title] [-p consensus_path_prefix] [-d decimal_place].
  3. Typical example: ./generate_consensus_spectrum_for_mscrush -c ../clusters/clusters-c*.txt -f ../mgf/D01*part*.mgf -d 7 -t CONSENSUS -p ../consensus/consensus. You will find consensus spectra files, each of which matches to a clustering file of a specific charge state, in our case, it is 5 files with name prefix consensus under dir ../consensus
  4. Description Type ./generate_consensus_spectrum_for_mscrush -h to see full list of command options.

    • -c, --clusters (required)

      Clustering files by msCRUSH.

    • -f, --files (required)

      MGF files.

    • -d, --decimal

      Decimal places for numbers. This parameter is optional. The default value is '3'.

    • -s, --separator

      Delimiter to separate MS2 titles in clusters This parameter is optional. The default value is '|'.

    • -t, --consensus_title

      Consensus spectrum title prefix. This parameter is optional. The default value is 'CONSENSUS'.

    • -p, --consensus_path_prefix

      Consensus result file prefix to write into This parameter is optional. The default value is 'consensus'.

Citation

msCRUSH: Fast Tandem Mass Spectral Clustering Using Locality Sensitive Hashing

Questions

Please contact Lei Wang (wang558@indiana.edu) for assistance.

Acknowledgement

This work was supported by the NIH grant 1R01AI108888 and the Indiana University Precision Health Initiative.