A library for proteomics QC report based on MultiQC framework. The library generates a QC report for the quantms pipeline. The library read the input of the quantms pipeline by specified analysis dir, with the following structure:
multiqc {analysis_dir} -o {output dir}
example: multiqc resources/LFQ -o ./
feature_intensity
)An example report can be found in multiqc_report.html
Most of the metrics are compute based on the out.mzTab
and the *.idXML
which contains the peptides and protein identifications.
First we show the experimental design of the dataset project http://bigbio.xyz/pmultiqc/shared-peptides-star-align-stricter-pep-protein-FDR/multiqc_report.html#proteomicslfq_exp_design . This is a translation from the SDRF proteomics standard to OpenMS standard configuration.
Pipeline performance overview: show the quantms performance overview including.
Summary Table: shows the number of spectra, % of identified spectra, total peptide count, total identified proteins (including protein groups - if two proteins are identified by the same peptide the two proteins are count) http://bigbio.xyz/pmultiqc/shared-peptides-star-align-stricter-pep-protein-FDR/multiqc_report.html#proteomicslfq_summary_table
MS1 Information: shows the quality control metrics on MS1 level, including total ion chromatograms (TIC), base peaks count (BPC), number of MS1 peaks, and general stats.
Pipeline Results Statistics: shows quantms pipeline final results, total peptide identified, total identified proteins et al (The data comes from mzTab and the experimental design file).
Number of peptides per Protein: Includes an histogram with the number of peptides per proteins http://bigbio.xyz/pmultiqc/shared-peptides-star-align-stricter-pep-protein-FDR/multiqc_report.html#num_of_pep_per_prot
Two tables are shown to the user with the first 500 peptides in the mzTab and the first 500 PSMs. This tables enable to show some of the most relevant peptide and PSMs in the experiment.
A table called Spectra Tracking summarize the Identification results by mzML file. The table capture the following numbers:
This section contains search scores and PEPs counts for different search engines in different files, and they also contain a summary of the consensus PSMs if two or more search engines are used.
The Precursor Charges Distribution aims to show the distribution of the precursor ion charges for a given whole experiment, but also for the identified spectra and unidentified spectra. This information can be used to identify potential ionization problems including many 1+ charges from an ESI ionization source or an unexpected distribution of charges. MALDI experiments are expected to contain almost exclusively 1+ charged ions. An unexpected charge distribution may furthermore be caused by specific search engine parameter settings such as limiting the search to specific ion charges.
The Number of Peaks per MS/MS spectrum aims to show the number of peaks per MS/MS spectrum in a given experiment. Too few peaks can identify poor fragmentation or a detector fault, as opposed to a large number of peaks representing very noisy spectra. This chart is extensively dependent on the pre-processing steps performed to the spectra (centroiding, deconvolution, peak picking approach, etc).
The Peak Intensity Distribution aims to show the Peak instensity in the MS2 spectra for all the experiment but also for the identified spectra. The plot split the intesity in chunks of 0-10, 10-100, 100-300, ... 6k-10k, >10k.
This is a histogram representing the ion intensity vs. the frequency for all MS2 spectra in a whole given experiment. It is possible to filter the information for all, identified and unidentified spectra. This plot can give a general estimation of the noise level of the spectra. Generally, one should expect to have a high number of low intensity noise peaks with a low number of high intensity signal peaks. A disproportionate number of high signal peaks may indicate heavy spectrum pre-filtering or potential experimental problems. In the case of data reuse this plot can be useful in identifying the requirement for pre-processing of the spectra prior to any downstream analysis. The quality of the identifications is not linked to this data as most search engines perform internal spectrum pre-processing before matching the spectra. Thus, the spectra reported are not necessarily pre-processed since the search engine may have applied the pre-processing step internally. This pre-processing is not necessarily reported in the experimental metadata.
The [Oversampling Distribution] aims to show the OverSampling information. An oversampled 3D-peak is defined as a peak whose peptide ion (same sequence and same charge state) was identified by at least two distinct MS2 spectra in the same Raw file. For high complexity samples, oversampling of individual 3D-peaks automatically leads to undersampling or even omission of other 3D-peaks, reducing the number of identified peptides. Oversampling occurs in low-complexity samples or long LC gradients, as well as undersized dynamic exclusion windows for data independent acquisitions.
The Delta Mass aims to show the Peak instensity in the MS2 spectra for all the experiment but also for the identified spectra. The plot split the intesity in chunks of 0-10, 10-100, 100-300, ... 6k-10k, >10k. Mass deltas close to zero reflect more accurate identifications and also that the reporting of the amino acid modifications and charges have been done accurately. This plot can highlight systematic bias if not centered on zero. Other distributions can reflect modifications not being reported properly. Also it is easy to see the different between the target and the decoys identifications.
The Peptides Quantification Table aims to show the quantitative level and distribution of peptides in different study variables, run and peptiforms. The distribution show all the intensity values in a bar plot above(blue) and below(red) the average intensity for all the samples. All intensities are log values.
1 - min(Q.Value)
for DIA datasets. Then It is equal to 1 - min(best_search_engine_score[1])
, which is from best_search_engine_score[1]
column in mzTab peptide table for DDA datasets.CT=Mixture;CN=UPS1;QY=0.1fmol
): Summarize intensity of fractions, and then mean intensity in technical replicates/biological replicates separately. Click distribution
to switch to bar plotsThe Protein Quantification Table also aims to show the quantitative level and distribution of proteins in different study variables. The distribution show all the intensity values in a bar plot above(blue) and below(red) the average intensity for all samples. All intensities are log values.
CT=Mixture;CN=UPS1;QY=0.1fmol
): Summarize intensity of peptides.Click distribution
to switch to bar plotsNote: Because DIA-NN has much difference in output file !!! So some metrics are difficult to calculate
Note: If you want to disable this plugin and use the multiqc function, please set disable_plugin
In short, for development, follow these steps:
git clone https://github.com/bigbio/pmultiqc && cd pmultiqc
pip install -r requirements.txt
pip install . -e
cd tests && multiqc resources/LFQ -o ./