bigbio / pmultiqc

A library for QC report based on MultiQC framework
GNU General Public License v3.0
14 stars 9 forks source link

pmultiqc

Python application Upload Python Package

A library for proteomics QC report based on MultiQC framework. The library generates a QC report for the quantms pipeline. The library read the input of the quantms pipeline by specified analysis dir, with the following structure:

Usage

multiqc {analysis_dir} -o {output dir}

example: multiqc resources/LFQ -o ./

parameters

An example report can be found in multiqc_report.html

Most of the metrics are compute based on the out.mzTab and the *.idXML which contains the peptides and protein identifications.

Metrics

General report

Results tables

Two tables are shown to the user with the first 500 peptides in the mzTab and the first 500 PSMs. This tables enable to show some of the most relevant peptide and PSMs in the experiment.

Identification Statistics

A table called Spectra Tracking summarize the Identification results by mzML file. The table capture the following numbers:

Summary of Search Engine Scores

This section contains search scores and PEPs counts for different search engines in different files, and they also contain a summary of the consensus PSMs if two or more search engines are used.

Precursor Charges Distribution

The Precursor Charges Distribution aims to show the distribution of the precursor ion charges for a given whole experiment, but also for the identified spectra and unidentified spectra. This information can be used to identify potential ionization problems including many 1+ charges from an ESI ionization source or an unexpected distribution of charges. MALDI experiments are expected to contain almost exclusively 1+ charged ions. An unexpected charge distribution may furthermore be caused by specific search engine parameter settings such as limiting the search to specific ion charges.

Number of Peaks per MS/MS spectrum

The Number of Peaks per MS/MS spectrum aims to show the number of peaks per MS/MS spectrum in a given experiment. Too few peaks can identify poor fragmentation or a detector fault, as opposed to a large number of peaks representing very noisy spectra. This chart is extensively dependent on the pre-processing steps performed to the spectra (centroiding, deconvolution, peak picking approach, etc).

Peak Intensity Distribution

The Peak Intensity Distribution aims to show the Peak instensity in the MS2 spectra for all the experiment but also for the identified spectra. The plot split the intesity in chunks of 0-10, 10-100, 100-300, ... 6k-10k, >10k.

This is a histogram representing the ion intensity vs. the frequency for all MS2 spectra in a whole given experiment. It is possible to filter the information for all, identified and unidentified spectra. This plot can give a general estimation of the noise level of the spectra. Generally, one should expect to have a high number of low intensity noise peaks with a low number of high intensity signal peaks. A disproportionate number of high signal peaks may indicate heavy spectrum pre-filtering or potential experimental problems. In the case of data reuse this plot can be useful in identifying the requirement for pre-processing of the spectra prior to any downstream analysis. The quality of the identifications is not linked to this data as most search engines perform internal spectrum pre-processing before matching the spectra. Thus, the spectra reported are not necessarily pre-processed since the search engine may have applied the pre-processing step internally. This pre-processing is not necessarily reported in the experimental metadata.

Oversampling Distribution

The [Oversampling Distribution] aims to show the OverSampling information. An oversampled 3D-peak is defined as a peak whose peptide ion (same sequence and same charge state) was identified by at least two distinct MS2 spectra in the same Raw file. For high complexity samples, oversampling of individual 3D-peaks automatically leads to undersampling or even omission of other 3D-peaks, reducing the number of identified peptides. Oversampling occurs in low-complexity samples or long LC gradients, as well as undersized dynamic exclusion windows for data independent acquisitions.

Delta Mass

The Delta Mass aims to show the Peak instensity in the MS2 spectra for all the experiment but also for the identified spectra. The plot split the intesity in chunks of 0-10, 10-100, 100-300, ... 6k-10k, >10k. Mass deltas close to zero reflect more accurate identifications and also that the reporting of the amino acid modifications and charges have been done accurately. This plot can highlight systematic bias if not centered on zero. Other distributions can reflect modifications not being reported properly. Also it is easy to see the different between the target and the decoys identifications.

Peptides Quantification Table

The Peptides Quantification Table aims to show the quantitative level and distribution of peptides in different study variables, run and peptiforms. The distribution show all the intensity values in a bar plot above(blue) and below(red) the average intensity for all the samples. All intensities are log values.

Protein Quantification Table

The Protein Quantification Table also aims to show the quantitative level and distribution of proteins in different study variables. The distribution show all the intensity values in a bar plot above(blue) and below(red) the average intensity for all samples. All intensities are log values.

Note: Because DIA-NN has much difference in output file !!! So some metrics are difficult to calculate

Note: If you want to disable this plugin and use the multiqc function, please set disable_plugin

Development quick start

In short, for development, follow these steps: