compomics / moFF

A modest Feature Finder (moFF) to extract MS1 intensities from Thermo raw file
Apache License 2.0
33 stars 11 forks source link

RAW file and mzML differences #21

Closed caleb-easterly closed 6 years ago

caleb-easterly commented 6 years ago

Hello,

I'm glad to see that mzML can be used as an input. However, I noticed some differences when I use mzML and when I use a RAW file. I ran the following

python $moffdir/moff.py --inputtsv $moffdir/sample_data/20080311_CPTAC6_07_6A005.txt \
    --inputraw $moffdatadir/20080311_CPTAC6_07_6A005.RAW \
    --tol 10 \
    --output_folder $moffdatadir/outRAW \
    --peptide_summary 1

python $moffdir/moff.py --inputtsv $moffdir/sample_data/20080311_CPTAC6_07_6A005.txt \
    --inputraw $moffdatadir/20080311_CPTAC6_07_6A005.mzML \
    --tol 10 \
    --output_folder $moffdatadir/outMZML \
    --peptide_summary 1

echo "Peptides matched with intensity: using mzML" 
cat $moffdatadir/outMZML/peptide*.tab | wc -l

echo "Peptides matched with intensity: using RAW"
cat $moffdatadir/outRAW/peptide*.tab | wc -l

And the output was

Peptides matched with intensity: using mzML
3036
Peptides matched with intensity: using RAW
3646

600 peptides is a pretty large proportion of 3600, so I'm concerned about missing many peptides when using larger files. Is this due to something outside of moFF, like the conversion from RAW to mzML? Or, am I doing something wrong?

Thanks! I've also attached the log files, and can give you anything else you need. 20080311_CPTAC6_07_6A005__moff_MZML.log 20080311_CPTAC6_07_6A005__moff_RAW.log

Maux82 commented 6 years ago

Hi, I will try to reproduce the error. The missing peptides are pretty large, I did a test like this (on another dataset) some days ago and it was fine.

Did you use the master branch ? If you run moff also using the following parameters (--rt_w 3 --rt_p 1) do you still see all those missing peptide?

caleb-easterly commented 6 years ago

Hi @Maux82, I just ran it again with the parameters you suggested and got the same results (3036 with MZML and 3646 with RAW). The commands were

python $moffdir/moff.py --inputtsv $moffdir/sample_data/20080311_CPTAC6_07_6A005.txt \
    --inputraw $moffdatadir/20080311_CPTAC6_07_6A005.RAW \
    --tol 10 \
    --rt_w 3 --rt_p 1 \
    --output_folder $moffdatadir/outRAW \
    --peptide_summary 1

python $moffdir/moff.py --inputtsv $moffdir/sample_data/20080311_CPTAC6_07_6A005.txt \
    --inputraw $moffdatadir/20080311_CPTAC6_07_6A005.mzML \
    --tol 10 \
    --rt_w 3 --rt_p 1 \
    --output_folder $moffdatadir/outMZML \
    --peptide_summary 1

Also, as part of output were the following lines: (for RAW)

Collecting moFF result file : 20080311_CPTAC6_07_6A005_moff_result.txt   --> Retrived peptide peaks after filtering:  3889

(for mzML)

Collecting moFF result file : 20080311_CPTAC6_07_6A005_moff_result.txt   --> Retrived peptide peaks after filtering:  3233
Maux82 commented 6 years ago

Hi , I already did some test and probably the identified peptide are not so realiable.
The one that I used here are based on mascot but I did not the searched by my own. I will change this demo data , maybe using same more realiable search engines but on the same CPTAC dataset

caleb-easterly commented 6 years ago

Thanks for looking into this. It does seem like it might be a data issue - I did some comparisons with other data - peptides identified with X!Tandem in SearchGUI, and got 2988 matches (RAW) and 2981 matches (mzML), which isn't a big difference.

Maux82 commented 6 years ago

I am going to replace the sample data with the imput data that I use for the moFF-GUI. This input data are the result of msgf+ and X!Tandem in SearchGui. On this data, result should be much more similar between raw and mzML. Let me know if you can test it.

caleb-easterly commented 6 years ago

Hi,

I tested the new identification files and did find that the results were quite a bit better:

Number of Peptides Quantified by Both
2400
Number Quantified by RAW
2446
Number Quantified by MZML
2400