compomics / ThermoRawFileParser

Thermo RAW file parser that runs on Linux/Mac and all other platforms that support Mono
Apache License 2.0
189 stars 50 forks source link

activation mode information lost after convert #24

Closed yafeng closed 5 years ago

yafeng commented 5 years ago

Hi, I used ThermoRawFileParser to convert raw files to mzML. The output works fine with search engine. However, when i use IsobaricAnalyzer, with -extraction filter by HCD or CID, it says no spectra passed filtering and give empty result. However, if i disable the filtering, it produces results. Is it possible the activation mode information is lost during convert?

You can download any raw files from here to test. ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/11/PXD002622

nielshulstaert commented 5 years ago

Hi,

My apologies for the late reply. I've converted one of the raw files and it seems there is an activation element with a random MS2 spectrum I've checked.

`

`

Does it work when you convert the raw file with msconvert?

Best regards,

Niels

yafeng commented 5 years ago

Hi, True, it has the activation element. I checked IsobarizAnalyzer, the --select_activation filtering tag filter by High-energy collision-induced dissociation, however the actual name in mzML is name="higher energy beam-type collision-induced dissociation", maybe this is the problem? is the character beam-type added later or is it in .RAW file?

nielshulstaert commented 5 years ago

Hi,

I've mapped the HigherEnergyCollisionalDissociation thermo propery to this controlled vocabulary term MS:1002481 (https://www.ebi.ac.uk/ols/ontologies/ms/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FMS_1002481). When I convert a RAW file of PXD002622 with msconvert, I've noticed they use MS:1000422 (https://www.ebi.ac.uk/ols/ontologies/ms/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FMS_1000422), the parent term of MS:1002481. Would that work for you? I don't find a term named High-energy collision-induced dissociation with the ols service.

Best regards,

Niels

yafeng commented 5 years ago

Ok, maybe it's just a different name used in isobaricAnalyzer. I also made some tests, there are some inconsistencies between ThermalRawFileParser and msconvert. the impact is usually on the downstream software, such as peptide identification algorithm and isobaricAnalyzer. For example, MSGF+ search engine can read fragmentation method as written in the msconvert parsed mzML file, but seems unable to do it for ThermalRawFileParser converted mzML file. However, you can predefine fragmentation method in MSGF+ instead of allowing it read it from mzML, that can avoid the problem.

yafeng commented 5 years ago

Here pasted is a print information from MSGF+ reading ThermalRawFileParser converted mzML file. The raw contains both CID and HCD spectra, it seems activationMethod CID can be recognized, but activationMethod for HCD spectra is not recognized, I think this is because different term MS:1002481 was used in ThermalRawFileParser. One thing I can test is to replace MS:1002481 by MS:1000422, see if it can be recognized.

Reading spectra...
Skip spectrum mzspec=Rui_predpi_mouse_N2A_plain_200ugIPG370-405_fr01.raw: controllerType=0 controllerNumber=1 scan=609 since activationMethod is CID, not HCD
Spectrum mzspec=Rui_predpi_mouse_N2A_plain_200ugIPG370-405_fr01.raw: controllerType=0 controllerNumber=1 scan=610 activationMethod is unknown; Using HCD as specified in parameters.
Skip spectrum mzspec=Rui_predpi_mouse_N2A_plain_200ugIPG370-405_fr01.raw: controllerType=0 controllerNumber=1 scan=654 since activationMethod is CID, not HCD
Spectrum mzspec=Rui_predpi_mouse_N2A_plain_200ugIPG370-405_fr01.raw: controllerType=0 controllerNumber=1 scan=655 activationMethod is unknown; Using HCD as specified in parameters.
Skip spectrum mzspec=Rui_predpi_mouse_N2A_plain_200ugIPG370-405_fr01.raw: controllerType=0 controllerNumber=1 scan=722 since activationMethod is CID, not HCD
Spectrum mzspec=Rui_predpi_mouse_N2A_plain_200ugIPG370-405_fr01.raw: controllerType=0 controllerNumber=1 scan=723 activationMethod is unknown; Using HCD as specified in parameters.
Skip spectrum mzspec=Rui_predpi_mouse_N2A_plain_200ugIPG370-405_fr01.raw: controllerType=0 controllerNumber=1 scan=773 since activationMethod is CID, not HCD
Spectrum mzspec=Rui_predpi_mouse_N2A_plain_200ugIPG370-405_fr01.raw: controllerType=0 controllerNumber=1 scan=774 activationMethod is unknown; Using HCD as specified in parameters.
Skip spectrum mzspec=Rui_predpi_mouse_N2A_plain_200ugIPG370-405_fr01.raw: controllerType=0 controllerNumber=1 scan=778 since activationMethod is CID, not HCD
Spectrum mzspec=Rui_predpi_mouse_N2A_plain_200ugIPG370-405_fr01.raw: controllerType=0 controllerNumber=1 scan=779 activationMethod is unknown; Using HCD as specified in parameters.
yafeng commented 5 years ago

I did the test, replacing MS:1002481 used in ThermalRawFileParser to MS:1000422. MSGF+ can recognize both CID and HCD activationMethod now. So, maybe it is correct to use the parent term MS:1000422 instead.

yafeng commented 5 years ago

I also test it with IsobaricAnalyzer. Replacing MS:1002481 to MS:1000422 also makes HCD spectra can be correctly recognized by -extraction:select_activation in IsobaricAnalyzer.

ypriverol commented 5 years ago

@nielshulstaert can we replace the CVTerm that we are using currently for the one proposed by @yafeng ?

nielshulstaert commented 5 years ago

Hi @yafeng ,

I've released a new version 1.1.2, could you verify that it works?

Thanks again for your input,

Niels

yafeng commented 5 years ago

@nielshulstaert Yes, I tested. Thank you!

nielshulstaert commented 5 years ago

Ok great, thanks for testing the library and let us know if you run into other issues.