The SDRF pipelines provide a set of tools to validate and convert SDRF files to different workflow configuration files such as MSstats,OpenMS and MaxQuant.
pip install sdrf-pipelines
Then, you can use the tool by executing the following command:
parse_sdrf validate-sdrf --sdrf_file {here_the_path_to_sdrf_file}
parse_sdrf convert-openms -s sdrf.tsv
The experimental settings file contains one row for every raw file. Columns contain relevevant parameters like precursor mass tolerance, modifications etc. These settings can usually be derived from the sdrf file.
URI | Filename | FixedModifications | VariableModifications | Label | PrecursorMassTolerance | PrecursorMassToleranceUnit | FragmentMassTolerance | FragmentMassToleranceUnit | DissociationMethod | Enzyme |
---|---|---|---|---|---|---|---|---|---|---|
ftp://ftp.pride.ebi.ac.uk/pride/data/archive/XX/PXD324343/A0218_1A_R_FR01.raw | A0218_1A_R_FR01.raw | Acetyl (Protein N-term) | Gln->pyro-glu (Q),Oxidation (M) | label free sample | 10 | ppm | 10 | ppm | HCD | Trypsin |
ftp://ftp.pride.ebi.ac.uk/pride/data/archive/XX/PXD324343/A0218_1A_R_FR02.raw | A0218_1A_R_FR02.raw | Acetyl (Protein N-term) | Gln->pyro-glu (Q),Oxidation (M) | label free sample | 10 | ppm | 10 | ppm | HCD | Trypsin |
The experimental design file contains information how to unambiguously map a single quantitative value. Most entries can be derived from the sdrf file. However, definition of conditions might need manual changes.
Fraction_Group | Fraction | Spectra_Filepath | Label | MSstats_Condition | MSstats_BioReplicate | |
---|---|---|---|---|---|---|
1 | 1 | A0218_1A_R_FR01.raw | 1 | 1 | 1 | 1 |
1 | 2 | A0218_1A_R_FR02.raw | 1 | 1 | 1 | 1 |
. | . | ... | . | . | . | . |
1 | 15 | A0218_2A_FR15.raw | 1 | 1 | 1 | 1 |
2 | 1 | A0218_2A_FR01.raw | 1 | 2 | 2 | 1 |
. | . | ... | . | . | . | . |
. | . | ... | . | . | . | . |
10 | 15 | A0218_10A_FR15.raw | 1 | 10 | 10 | 1 |
For details, please see the MSstats documentation
parse_sdrf convert-maxquant -s sdrf.tsv -f {here_the_path_to_protein_database_file} -m {True or False} -pef {default 0.01} -prf {default 0.01} -t {temporary folder} -r {raw_data_folder} -n {number of threads:default 1} -o1 {parameters(.xml) output file path} -o2 {maxquant experimental design(.txt) output file path}
eg.
parse_sdrf convert-maxquant -s /root/ChengXin/Desktop/sdrf.tsv -f /root/ChengXin/MyProgram/search_spectra/AT/TAIR10_pep_20101214.fasta -r /root/ChengXin/MyProgram/virtuabox/share/raw_data/ -o1 /root/ChengXin/test.xml -o2 /root/ChengXin/test_exp.xml -t /root/ChengXin/MyProgram/virtuabox/share/raw_data/ -pef 0.01 -prf 0.01 -n 4
The maxquant parameters file mqpar.xml contains the parameters required for maxquant operation.some settings can usually be derived from the sdrf file such as enzyme, fixed modification, variable modification, instrument, fraction and label etc.Set other parameters as default.The current version of maxquant supported by the script is 1.6.10.43
Some parameters are listed:
For details, please see the MaxQuant documentation
The maxquant experimental design file contains name,Fraction,Experiement and PTM column.Most entries can be derived from the sdrf file.
Name | Fraction | Experiment | PTM |
---|---|---|---|
130402_08.raw | 1 | sample 1_Tr_1 | |
130412_08.raw | 1 | sample 2_Tr_1 |
parse_sdrf convert-msstats -s ./testdata/PXD000288.sdrf.tsv -o ./test1.csv
parse_sdrf convert-normalyzerde -s ./testdata/PXD000288.sdrf.tsv -o ./testPXD000288_design.tsv
-c ["characteristics[spiked compound]"]
(optional)Dai C, Füllgrabe A, Pfeuffer J, Solovyeva EM, Deng J, Moreno P, Kamatchinathan S, Kundu DJ, George N, Fexova S, Grüning B, Föll MC, Griss J, Vaudel M, Audain E, Locard-Paulet M, Turewicz M, Eisenacher M, Uszkoreit J, Van Den Bossche T, Schwämmle V, Webel H, Schulze S, Bouyssié D, Jayaram S, Duggineni VK, Samaras P, Wilhelm M, Choi M, Wang M, Kohlbacher O, Brazma A, Papatheodorou I, Bandeira N, Deutsch EW, Vizcaíno JA, Bai M, Sachsenberg T, Levitsky LI, Perez-Riverol Y. A proteomics sample metadata representation for multiomics integration and big data analysis. Nat Commun. 2021 Oct 6;12(1):5854. doi: 10.1038/s41467-021-26111-3. PMID: 34615866; PMCID: PMC8494749. Manuscript
Perez-Riverol, Yasset, and European Bioinformatics Community for Mass Spectrometry. "Toward a Sample Metadata Standard in Public Proteomics Repositories." Journal of Proteome Research 19.10 (2020): 3906-3909. Manuscript