glasgowcompbio / vimms

A programmable and modular LC/MS simulator in Python
MIT License
19 stars 6 forks source link

TopN Simulator writing improperly to .mzML file #244

Closed samsonjm closed 1 year ago

samsonjm commented 2 years ago

I'm looking at the example data and example jupyter notebooks that are provided with the ViMMS download.

When in book 04. Top-N Simulations, in section 5. Compare Results, the first line (loading simulated data) is not working correctly - I get the error below:

simulated_input_file = mzml_out

print(count_stuff(simulated_input_file, min_rt, max_rt))

simulated_mzs, simulated_rts, simulated_intensities, simulated_cumsum_ms1, simulated_cumsum_ms2 = count_stuff(

    simulated_input_file, min_rt, max_rt)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_19439/3996044459.py in <module>
      1 import pymzml
      2 simulated_input_file = mzml_out
----> 3 print(count_stuff(simulated_input_file, min_rt, max_rt))
      4 simulated_mzs, simulated_rts, simulated_intensities, simulated_cumsum_ms1, simulated_cumsum_ms2 = count_stuff(
      5     simulated_input_file, min_rt, max_rt)

~/Projects/ViMMS/vimms/vimms/PlotsForPaper.py in count_stuff(input_file, min_rt, max_rt)
    240             elif ms_level == 2:
    241                 try:
--> 242                     selected_precursors = spectrum.selected_precursors
    243                     count_selected_precursors += len(selected_precursors)
    244                     mz = selected_precursors[0]['mz']

~/.local/share/virtualenvs/vimms-_60yp1CX/lib/python3.7/site-packages/pymzml/spec.py in selected_precursors(self)
    937                     ids.append(
    938                         #re.compile(r"SPECTRUM_([0-9]+)$").search(spec_ref).group(1)
--> 939                         regex_patterns.SPECTRUM_ID_PATTERN.search(spec_ref).group(1)
    940                     )
    941                 else:

AttributeError: 'NoneType' object has no attribute 'group'

I've traced this back to the .mzML file that is being loaded in for simulated data.

The issue is in the <spectrumRef ...> portion of the mzML file. The code expects a string that includes "... scan=1", but the simulated mzML file instead has "SPECTRUM_1". The next line of code, using the real_input_file, works as expected.

I've tried to look into where the MzmlWriter would be putting this in so I could fix it, but haven't been able to hash out exactly where "SPECTRUM_" is coming from - maybe its coming from a library rather than your code.

joewandy commented 2 years ago

Thanks for raising this issue @samsonjm. I'll take a look at the problem above as soon as possible.

joewandy commented 1 year ago

should have been fixed i think