Open MNTsnowman opened 1 week ago
I suspect that all of the spectra were skipped:
WARNING: Skipped 25714 spectra with invalid precursor info
You already indicated that you suspected something wrong with the scan headers. Did you modify them in some way?
Normally standard mzML files produced by MSConvert, ThermoRawFileParser, etc. should all work. We do not edit the mzML files or the headers in there at all.
Hi @bittremieux
Yes i suspect the headders as my data orriginates from a timsTOF with the IM engaged. I don't think that the IM is to blame as it is handeled in the conversion (see command below). Given that the data is from a timsTOF I do not think the ThermoRawFileParser is used at all.
For info, the CMD command i use to generate the mzML files is something along the lines of this : "C:\Users...\ProteoWizard 3.0.23167.44089af 64-bit\msconvert.exe" --combineIonMobilitySpectra --filter "peakPicking vendor msLevel=1-" --filter "scanSumming precursorTol=0.05 scanTimeTol=5 ionMobilityTol=0.1 sumMs1=0" --filter "titleMaker
So given that it skips all the scans, and that it states that the precursor info is invalid, i was wondering what your settings were to generate the scan title, in other words what is your "titlemaker" part of your conversion command. I hope this makes sense. Also, please let me know if you have other suggestions for what could be wrong. :)
I have limited hands-on experience with timsTOF conversion to mzML, so I don't know how the titleMaker filter should be used. But I'd be surprised if that's the problem. I suspect something about the IM actually.
Can you share the mzML file here to have a look at?
Unfortunately I'm unable to share a file here. If you have an E-mail we could continue the conversation over we could maybe figure something out.
Alternatively I could try to compare the headers of your demo data with my data.
You can email me at wout.bittremieux@uantwerpen.be.
Hi Casanovo
This is the first time i'm attempting to use casanovo, i have tried to follow your guide at : https://casanovo.readthedocs.io/en/latest/getting_started.html
I'm getting this error (see below). I'm wondering if it could have something to do with the headders of the scans in the mzML files, if this sounds like a possibility, could you please provide the command line settings you guys are using for generating the mzML files and how you name and structure the headder?
D:...\De Novo>casanovo sequence -m WorkDir\casanovo_massivekb.ckpt -c WorkDir\casanovo_config.yaml Data\mzML\14-2-NM_S4-A1_1_9156.mzML WARNING: Dataloader multiprocessing is currently not supported on Windows or MacOS; using only a single thread. Seed set to 454 INFO: Casanovo version 4.2.1 INFO: Sequencing peptides from: INFO: Data\mzML\14-2-NM_S4-A1_1_9156.mzML GPU available: False, used: False TPU available: False, using: 0 TPU cores HPU available: False, using: 0 HPUs INFO: Reading 1 files... Data\mzML\14-2-NM_S4-A1_1_9156.mzML: 100%|█████████████████████████████████| 27193/27193 [00:32<00:00, 835.91spectra/s] WARNING: Skipped 25714 spectra with invalid precursor info Traceback (most recent call last): File "C:\Users...\casanovo_env\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users...\casanovo_env\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users...\casanovo_env\Scripts\casanovo.exe__main.py", line 7, in
File "C:\Users...\casanovo_env\lib\site-packages\rich_click\rich_command.py", line 367, in call
return super().call(*args, **kwargs)
File "C:\Users...\casanovo_env\lib\site-packages\click\core.py", line 1157, in call
return self.main(*args, kwargs)
File "C:\Users...\casanovo_env\lib\site-packages\rich_click\rich_command.py", line 152, in main
rv = self.invoke(ctx)
File "C:\Users...\casanovo_env\lib\site-packages\click\core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users...\casanovo_env\lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, ctx.params)
File "C:\Users...\casanovo_env\lib\site-packages\click\core.py", line 783, in invoke
return callback(*args, **kwargs)
File "C:\Users...\casanovo_env\lib\site-packages\casanovo\casanovo.py", line 143, in sequence
runner.predict(peak_path, output)
File "C:\Users...\casanovo_env\lib\site-packages\casanovo\denovo\model_runner.py", line 160, in predict
test_index = self._get_index(peak_path, False, "")
File "C:\Users...\casanovo_env\lib\site-packages\casanovo\denovo\model_runner.py", line 394, in _get_index
return Index(index_fname, filenames, valid_charge=valid_charge)
File "C:\Users...\casanovo_env\lib\site-packages\depthcharge\data\hdf5.py", line 104, in init
self.add_file(ms_file)
File "C:\Users...\casanovo_env\lib\site-packages\depthcharge\data\hdf5.py", line 195, in add_file
metadata = self._assemble_metadata(parser)
File "C:\Users...\casanovo_env\lib\site-packages\depthcharge\data\hdf5.py", line 173, in _assemble_metadata
metadata["scan_id"] = parser.scan_id
ValueError: could not broadcast input array from shape (0,) into shape (25714,)