ProteoWizard / pwiz

The ProteoWizard Library is a set of software libraries and tools for rapid development of mass spectrometry and proteomic data analysis software.
http://proteowizard.sourceforge.net/
Apache License 2.0
215 stars 97 forks source link

mzML converted from wiff produces different DiaNN results #2703

Open calizilla opened 1 year ago

calizilla commented 1 year ago

Hi,

I have converted wiff files to mzML with this container using the below command:

singularity run --env WINEDEBUG=-all \
        -B /scratch/:/scratch \
        ${pwiz} wine msconvert \
        ${wiff} \
        --32 \
        --filter "peakPicking vendor msLevel=1-" \
        -o ${outdir} \
        --outfile ${sampleID}.mzML > ${logdir}/${sampleID}.log 2>&1

I have also converted the same samples with the MSConvert GUI using the same parameters, and the resulting mzML files are identical.

When these mzML files are used in DiaNN analysis (either Linux or GUI) the results differ markedly from the same analysis conducted on the GUI with .wiff input. The abundance ratios are much higher from mzML input, the ratios are not consistent between samples for the same protein, and the number of unique genes per sample is lower.

I have submitted this as an issue to DiaNN, but wondering if you could provide some insights as to possible causes, and is there a way I can 'view' the wiff data to observe the values directly to compare to the values reported in the converted mzML files.

Many thanks, Cali

chambm commented 1 year ago

ProteoWizard has the SeeMS application that lets you view any format that msconvert can handle, and also add some of the filters that msconvert can apply. The default installation settings even add it to the Windows Explorer context menu when you right-click on files and folders. With SeeMS it should be pretty easy to stack the same spectrum from WIFF and mzML on top of each other, synchronize zooming, and see what's causing the difference.

calizilla commented 1 year ago

Thanks for your swift and perfect reply, what a great tool!

The only difference is the number of data points: from the mzML file (top of screenshot below), values are much smaller than for the same sample in wiff format (below half of screenshot).

SeeMS_Barcombe_CSP_MM_AS_12_1_mzML_vs_wiff

I extracted the values from SeeMS and summed the summable columns to confirm it was just the data points column that differed:

Data Points Base Peak m/z Base Peak Intensity Total Ion Current Precursor Info
mzML 802,472,849.00 101,536,067.10 6,138,657,847.00 550,390,347,558.67 158,799,923.92
wiff 6,305,120,887.00 101,536,067.10 6,138,657,847.00 550,390,347,558.67 158,799,923.92

Proteomics is not my field, could you please advise if this difference in the number of data points is likely to influence results of DIA analysis?

Many thanks, Cali

chambm commented 1 year ago

That's the peak picking. Going from profile data to centroid. Maybe DIA-NN run directly on WIFF does peak picking differently?