compomics / ThermoRawFileParser

Thermo RAW file parser that runs on Linux/Mac and all other platforms that support Mono
Apache License 2.0
181 stars 47 forks source link

Conversion fails for PRM files #163

Closed lazear closed 12 months ago

lazear commented 1 year ago

Hi all,

I was hoping to use TRFP to convert from PRM files to mzML, but I'm getting failures on all of them:

2023-05-17 21:04:11 ERROR Failed finding precursor for 6032
2023-05-17 21:04:11 ERROR Failed finding precursor for 6033
2023-05-17 21:04:11 ERROR Failed finding precursor for 6034
2023-05-17 21:04:11 ERROR Failed finding precursor for 6035
2023-05-17 21:04:11 ERROR Failed finding precursor for 6036
...

Unfortunately I cannot share these raw files. Is there any way to add support for this? Happy to poke around in the codebase and make a contribution if you can point me in the right direction.

I'm using ThermoRawFileParser1.4.2 with the following arguments:

"ThermoRawFileParser.exe -N -f 1 -g"

lazear commented 1 year ago

For example data, any of the "Patient.raw" files from https://panoramaweb.org/Panorama%20Public/2022/U%20of%20Newcastle%20Trost%20Lab%20-%20SARS-CoV-2%20PRM/project-begin.view?pageId=Raw%20Data can be used.

https://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD033790

lazear commented 1 year ago

Actually, it looks like it's still processing the files despite the errors - and this is an issue somewhere else on my end where I am failing the docker container if there is an "ERROR" entry in the logs. Perhaps this could be a "WARNING" instead?

caetera commented 1 year ago

Hi @lazear ,

I have checked a couple of example files (from Panorama) and they seem to contain only MS2 scans (no MS), thus, it is impossible to determine the parent scan for any of them (in fact there is no parent scans). Hence, the error is produced during the processing, in that case complete <precursorList> structure will be omitted in the output, but the file will be processed otherwise.

TRFP's exit code shows the number of errors (or more critical events) produced during the processing, thus, it will be not zero in your case (maybe that will help you find troubleshoot docker failure).

It might be worth demoting this error to a warning, indeed.

lazear commented 1 year ago

Ah, I didn't notice that the <precursorList> will be omitted entirely. This is definitely necessary to process PRM data (and yes, there will be only MS2) scans - namely, the isolation window target m/z (or selected ion m/z) and the lower/upper offsets are needed.

Here is an example of how MSConvert is handling the "Patient1.raw" file:

 <spectrum index="0" id="controllerType=0 controllerNumber=1 scan=1" defaultArrayLength="250">
        <cvParam cvRef="MS" accession="MS:1000580" name="MSn spectrum" value=""/>
        <cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="2"/>
        <cvParam cvRef="MS" accession="MS:1000130" name="positive scan" value=""/>
        <cvParam cvRef="MS" accession="MS:1000128" name="profile spectrum" value=""/>
        <cvParam cvRef="MS" accession="MS:1000504" name="base peak m/z" value="543.119696" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
        <cvParam cvRef="MS" accession="MS:1000505" name="base peak intensity" value="76529.718999999997" unitCvRef="MS" unitAccession="MS:1000131" unitName="number of detector counts"/>
        <cvParam cvRef="MS" accession="MS:1000285" name="total ion current" value="1.2158727e05"/>
        <cvParam cvRef="MS" accession="MS:1000528" name="lowest observed m/z" value="74.254416708155" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
        <cvParam cvRef="MS" accession="MS:1000527" name="highest observed m/z" value="1136.426533748271" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
        <scanList count="1">
          <cvParam cvRef="MS" accession="MS:1000795" name="no combination" value=""/>
          <scan>
            <cvParam cvRef="MS" accession="MS:1000016" name="scan start time" value="1.0016312" unitCvRef="UO" unitAccession="UO:0000031" unitName="minute"/>
            <cvParam cvRef="MS" accession="MS:1000512" name="filter string" value="FTMS + p NSI Full ms2 543.2723@hcd23.00 [75.0000-1125.0000]"/>
            <cvParam cvRef="MS" accession="MS:1000616" name="preset scan configuration" value="1"/>
            <cvParam cvRef="MS" accession="MS:1000927" name="ion injection time" value="50.000000745058" unitCvRef="UO" unitAccession="UO:0000028" unitName="millisecond"/>
            <userParam name="[Thermo Trailer Extra]Monoisotopic M/Z:" value="0" type="xsd:float"/>
            <scanWindowList count="1">
              <scanWindow>
                <cvParam cvRef="MS" accession="MS:1000501" name="scan window lower limit" value="75.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                <cvParam cvRef="MS" accession="MS:1000500" name="scan window upper limit" value="1125.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
              </scanWindow>
            </scanWindowList>
          </scan>
        </scanList>
        <precursorList count="1">
          <precursor>
            <isolationWindow>
              <cvParam cvRef="MS" accession="MS:1000827" name="isolation window target m/z" value="543.272338867188" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
              <cvParam cvRef="MS" accession="MS:1000828" name="isolation window lower offset" value="0.600000023842" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
              <cvParam cvRef="MS" accession="MS:1000829" name="isolation window upper offset" value="0.600000023842" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
              <userParam name="ms level" value="1"/>
            </isolationWindow>
            <selectedIonList count="1">
              <selectedIon>
                <cvParam cvRef="MS" accession="MS:1000744" name="selected ion m/z" value="543.272338867188" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                <cvParam cvRef="MS" accession="MS:1000041" name="charge state" value="2"/>
              </selectedIon>
            </selectedIonList>
            <activation>
              <cvParam cvRef="MS" accession="MS:1000422" name="beam-type collision-induced dissociation" value=""/>
              <cvParam cvRef="MS" accession="MS:1000045" name="collision energy" value="23.0" unitCvRef="UO" unitAccession="UO:0000266" unitName="electronvolt"/>
            </activation>
          </precursor>
        </precursorList>
        <binaryDataArrayList count="2">
          <binaryDataArray encodedLength="912">
            <cvParam cvRef="MS" accession="MS:1000521" name="32-bit float" value=""/>
            <cvParam cvRef="MS" accession="MS:1000574" name="zlib compression" value=""/>
            <cvParam cvRef="MS" accession="MS:1000514" name="m/z array" value="" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
            <binary>eJwNyVlsTGEcQHGqXBlLIkGkEpqMNdZaY+f7f0LGGru2CBNJhYSY8WBpyiCWUkHNoGpprR2N4nY09SDhFqU6CKpaZqhWSTCRWHOV8/B7OUd82WodcvEA+XF+9QRN6N/Cr5KxG8WoQ4d4vxqPZdiPm2jE6oGlqgCf0HdQqUrDBTSg1+BStRInUIuuSaUqGcsLwuoyfmBSMKwy8QKJl8NqNYrxB7owrA6gGkP8EbUT1egfiKgMPEXPoxG1ERXofiyi1qMMXY5H1LyNUXURNmZtiqp8/IRrc1SdxDdM3BJVh/EBY9KjanJGTOWhCSlbY6oEnbbF1HpUop8vpo7hB+Zsj6kr+PzQJXMrXFKCbo9csgMfMavSJSYSwi7xoREzH7vkOlY9d8s7pLxwyzPMqHLLXUx46ZYSJFW7pQB9XrnlDBJq3HIY4/96xIKrySOPseCfR2rhbuaVj1jb3Cs2fHFead3CKwdQvTsgw/cE5BC+YvregFxCq8yArMAtJO4LSDpeYcT+gMwvCkoZhl8Nyjl0uhaUnfiOldeD8hyTzaAUo0dxULKxpsaUeiytNaUKs1+bUg71xpSbGBYxpQh9o6bkoetbU85FQ5L4NiQ56PwuJAfRti4kuxD3PiTp+AVPfUi+YFVDSHovtKQIoxZZchvTFlvyDEuSLWnA2hRL4lItyUTHJZbkottxW+pRmGPLhhO2jMu1Jf6kLRXIPmVL6mnaGVta5tnyCNn5vDuGHmsZOoiEMkPvwW+k3TV0FabcM3QIPe8b+gjiyw3tQR2mPjD0DfR6aGg/vHUO3YiU9w5diUn1Dm2id4ND56D9B4f24TvSGh26BhkDnLrDQKc+i5GDnLocqYOd+it8SU7dcYhTX8DooU5dgWXDnLqwTZae3zZLN+F8uyz9H6813ok=</binary>
          </binaryDataArray>
          <binaryDataArray encodedLength="592">
            <cvParam cvRef="MS" accession="MS:1000521" name="32-bit float" value=""/>
            <cvParam cvRef="MS" accession="MS:1000574" name="zlib compression" value=""/>
            <cvParam cvRef="MS" accession="MS:1000515" name="intensity array" value="" unitCvRef="MS" unitAccession="MS:1000131" unitName="number of detector counts"/>
            <binary>eJxjYMAP1rZbunA/+eaS/trOVSnMxVV0AofrR4lsFwLaGCbtWOyygJfR9WiZgOuK1y9cBBRqCOrRdZzgYi76yWWHKo+rRetrF6UjXQT1ZJ1KddlmdMtFczWHa6/yB5dKzR6CeiwkW1xu/r3vcjGf1dWn4K3LrVvzCeoR/Z/oMqvvjUvoQylXmRk8roYNawnqOb1rowuzNY/rpBhJ1yO73rskva8kqOe6aJpLl+kHF2YBddeIb1quDLFMrp56/QT1qV697cLOquZa9cXQVe8Nr2uuzQqCeuoO3XI5vdDZdf7+LNd5Fq6uWnbPCOp5wNHv4nGN19Xjn6lrUYiKa7v4BYJ6TnR0uVRe+OrSuUPEtXshs+tGtuUE9ZzRmOCyOeOry7QCQdewg99cjsV2E9RTfWqOy+ZYXtewVWqumrtZXUVTOgjqObjooUvJbR3Xu44GroZZn11mZLgT1KPg3OD8t7/Z9ZRVntuVMyLu4V2l7vPPTHG/xxPvPouB0T3wqIabR42wa6kmizMhs3SytrtcXCjsyjFZw1U/j8v14Z/pBO3//qfOZdWTjy4+K4Rc2WP/ufCsW0xQDwBRtaQw</binary>
          </binaryDataArray>
        </binaryDataArrayList>
      </spectrum>
lazear commented 1 year ago

I was able to add a workaround in the codebase, I will open a PR shortly for review.

caetera commented 1 year ago

Thank you, I will revise the PR and schedule it for the next release.

caetera commented 12 months ago

Released in version 1.4.3