bigbio / sdrf-pipelines

A repository to convert SDRF proteomics files into pipelines config files
Apache License 2.0
16 stars 21 forks source link
mass-spectrometry msstats multiomics proteomics proteomics-data-analysis sdrf

sdrf-pipelines

Python application Python package Upload Python Package Codacy Badge PyPI version PyPI - Downloads

The SDRF pipelines provide a set of tools to validate and convert SDRF files to different workflow configuration files such as MSstats,OpenMS and MaxQuant.

Installation

pip install sdrf-pipelines

Validate the SDRF

How to use it:

Then, you can use the tool by executing the following command:

parse_sdrf validate-sdrf --sdrf_file {here_the_path_to_sdrf_file}

Convert to OpenMS: Usage

parse_sdrf convert-openms -s sdrf.tsv

Description:

The experimental settings file contains one row for every raw file. Columns contain relevevant parameters like precursor mass tolerance, modifications etc. These settings can usually be derived from the sdrf file.

URI Filename FixedModifications VariableModifications Label PrecursorMassTolerance PrecursorMassToleranceUnit FragmentMassTolerance FragmentMassToleranceUnit DissociationMethod Enzyme
ftp://ftp.pride.ebi.ac.uk/pride/data/archive/XX/PXD324343/A0218_1A_R_FR01.raw A0218_1A_R_FR01.raw Acetyl (Protein N-term) Gln->pyro-glu (Q),Oxidation (M) label free sample 10 ppm 10 ppm HCD Trypsin
ftp://ftp.pride.ebi.ac.uk/pride/data/archive/XX/PXD324343/A0218_1A_R_FR02.raw A0218_1A_R_FR02.raw Acetyl (Protein N-term) Gln->pyro-glu (Q),Oxidation (M) label free sample 10 ppm 10 ppm HCD Trypsin

The experimental design file contains information how to unambiguously map a single quantitative value. Most entries can be derived from the sdrf file. However, definition of conditions might need manual changes.

Fraction_Group Fraction Spectra_Filepath Label MSstats_Condition MSstats_BioReplicate
1 1 A0218_1A_R_FR01.raw 1 1 1 1
1 2 A0218_1A_R_FR02.raw 1 1 1 1
. . ... . . . .
1 15 A0218_2A_FR15.raw 1 1 1 1
2 1 A0218_2A_FR01.raw 1 2 2 1
. . ... . . . .
. . ... . . . .
10 15 A0218_10A_FR15.raw 1 10 10 1

For details, please see the MSstats documentation

Convert to MaxQuant: Usage

parse_sdrf convert-maxquant -s sdrf.tsv -f {here_the_path_to_protein_database_file} -m {True or False} -pef {default 0.01} -prf {default 0.01} -t {temporary folder} -r {raw_data_folder} -n {number of threads:default 1} -o1 {parameters(.xml) output file path} -o2 {maxquant experimental design(.txt) output file path}

eg.

parse_sdrf convert-maxquant -s /root/ChengXin/Desktop/sdrf.tsv -f /root/ChengXin/MyProgram/search_spectra/AT/TAIR10_pep_20101214.fasta -r /root/ChengXin/MyProgram/virtuabox/share/raw_data/ -o1 /root/ChengXin/test.xml -o2 /root/ChengXin/test_exp.xml -t /root/ChengXin/MyProgram/virtuabox/share/raw_data/ -pef 0.01 -prf 0.01 -n 4

Description

The maxquant parameters file mqpar.xml contains the parameters required for maxquant operation.some settings can usually be derived from the sdrf file such as enzyme, fixed modification, variable modification, instrument, fraction and label etc.Set other parameters as default.The current version of maxquant supported by the script is 1.6.10.43

Some parameters are listed:

For details, please see the MaxQuant documentation

The maxquant experimental design file contains name,Fraction,Experiement and PTM column.Most entries can be derived from the sdrf file.

Name Fraction Experiment PTM
130402_08.raw 1 sample 1_Tr_1
130412_08.raw 1 sample 2_Tr_1

Convert to MSstats annotation file: Usage

parse_sdrf convert-msstats -s ./testdata/PXD000288.sdrf.tsv -o ./test1.csv

Convert to NormalyzerDE design file: Usage

parse_sdrf convert-normalyzerde -s ./testdata/PXD000288.sdrf.tsv -o ./testPXD000288_design.tsv

Citations