Noble-Lab / casanovo

De Novo Mass Spectrometry Peptide Sequencing with a Transformer Model
https://casanovo.readthedocs.io
Apache License 2.0
115 stars 40 forks source link

Key error by running example #178

Closed PMDekker closed 1 year ago

PMDekker commented 1 year ago

While trying to run the "small example" from the "Getting started" guide, I run into a Key error. No idea what is causing this though. I have attached the output log.

output.log

bittremieux commented 1 year ago

I don't actually see the error in the log. Is there possibly additional console output?

PMDekker commented 1 year ago

Sorry, I just noticed that not all console output is in the .log file indeed. I will run it again and will post the additional output.

PMDekker commented 1 year ago

This is all the console output:

Global seed set to 454
2023-04-28 15:52:13,003 INFO [casanovo/MainProcess] casanovo._get_model_weights : Model weights file C:\Users\dekke102\AppData\Local\casanovo\casanovo_massivekb_v3_0_0.ckpt retrieved from local cache
2023-04-28 15:52:13,006 INFO [casanovo/MainProcess] casanovo.main : Casanovo version 3.3.0
2023-04-28 15:52:13,008 DEBUG [casanovo/MainProcess] casanovo.main : mode = denovo
2023-04-28 15:52:13,011 DEBUG [casanovo/MainProcess] casanovo.main : model = C:\Users\dekke102\AppData\Local\casanovo\casanovo_massivekb_v3_0_0.ckpt
2023-04-28 15:52:13,012 DEBUG [casanovo/MainProcess] casanovo.main : peak_path = C:/MyPrograms/MSConvert_output_temp/sample_preprocessed_spectra.mgf
2023-04-28 15:52:13,012 DEBUG [casanovo/MainProcess] casanovo.main : peak_path_val = None
2023-04-28 15:52:13,014 DEBUG [casanovo/MainProcess] casanovo.main : config = default
2023-04-28 15:52:13,014 DEBUG [casanovo/MainProcess] casanovo.main : output = C:\MyPrograms\Casanovo_output\output
2023-04-28 15:52:13,014 DEBUG [casanovo/MainProcess] casanovo.main : random_seed = 454
2023-04-28 15:52:13,014 DEBUG [casanovo/MainProcess] casanovo.main : n_peaks = 150
2023-04-28 15:52:13,014 DEBUG [casanovo/MainProcess] casanovo.main : min_mz = 50.0
2023-04-28 15:52:13,015 DEBUG [casanovo/MainProcess] casanovo.main : max_mz = 2500.0
2023-04-28 15:52:13,015 DEBUG [casanovo/MainProcess] casanovo.main : min_intensity = 0.01
2023-04-28 15:52:13,015 DEBUG [casanovo/MainProcess] casanovo.main : remove_precursor_tol = 2.0
2023-04-28 15:52:13,015 DEBUG [casanovo/MainProcess] casanovo.main : max_charge = 10
2023-04-28 15:52:13,016 DEBUG [casanovo/MainProcess] casanovo.main : precursor_mass_tol = 50.0
2023-04-28 15:52:13,016 DEBUG [casanovo/MainProcess] casanovo.main : isotope_error_range = (0, 1)
2023-04-28 15:52:13,016 DEBUG [casanovo/MainProcess] casanovo.main : min_peptide_len = 6
2023-04-28 15:52:13,016 DEBUG [casanovo/MainProcess] casanovo.main : dim_model = 512
2023-04-28 15:52:13,017 DEBUG [casanovo/MainProcess] casanovo.main : n_head = 8
2023-04-28 15:52:13,017 DEBUG [casanovo/MainProcess] casanovo.main : dim_feedforward = 1024
2023-04-28 15:52:13,018 DEBUG [casanovo/MainProcess] casanovo.main : n_layers = 9
2023-04-28 15:52:13,019 DEBUG [casanovo/MainProcess] casanovo.main : dropout = 0.0
2023-04-28 15:52:13,019 DEBUG [casanovo/MainProcess] casanovo.main : dim_intensity = None
2023-04-28 15:52:13,023 DEBUG [casanovo/MainProcess] casanovo.main : custom_encoder = None
2023-04-28 15:52:13,029 DEBUG [casanovo/MainProcess] casanovo.main : max_length = 100
2023-04-28 15:52:13,030 DEBUG [casanovo/MainProcess] casanovo.main : residues = {'G': 57.021464, 'A': 71.037114, 'S': 87.032028, 'P': 97.052764, 'V': 99.068414, 'T': 101.04767, 'C+57.021': 160.030649, 'L': 113.084064, 'I': 113.084064, 'N': 114.042927, 'D': 115.026943, 'Q': 128.058578, 'K': 128.094963, 'E': 129.042593, 'M': 131.040485, 'H': 137.058912, 'F': 147.068414, 'R': 156.101111, 'Y': 163.063329, 'W': 186.079313, 'M+15.995': 147.0354, 'N+0.984': 115.026943, 'Q+0.984': 129.042594, '+42.011': 42.010565, '+43.006': 43.005814, '-17.027': -17.026549, '+43.006-17.027': 25.980265}
2023-04-28 15:52:13,031 DEBUG [casanovo/MainProcess] casanovo.main : n_log = 1
2023-04-28 15:52:13,031 DEBUG [casanovo/MainProcess] casanovo.main : tb_summarywriter = None
2023-04-28 15:52:13,031 DEBUG [casanovo/MainProcess] casanovo.main : warmup_iters = 100000
2023-04-28 15:52:13,032 DEBUG [casanovo/MainProcess] casanovo.main : max_iters = 600000
2023-04-28 15:52:13,032 DEBUG [casanovo/MainProcess] casanovo.main : learning_rate = 0.0005
2023-04-28 15:52:13,033 DEBUG [casanovo/MainProcess] casanovo.main : weight_decay = 1e-05
2023-04-28 15:52:13,033 DEBUG [casanovo/MainProcess] casanovo.main : train_batch_size = 32
2023-04-28 15:52:13,033 DEBUG [casanovo/MainProcess] casanovo.main : predict_batch_size = 1024
2023-04-28 15:52:13,034 DEBUG [casanovo/MainProcess] casanovo.main : n_beams = 5
2023-04-28 15:52:13,034 DEBUG [casanovo/MainProcess] casanovo.main : top_match = 1
2023-04-28 15:52:13,034 DEBUG [casanovo/MainProcess] casanovo.main : logger = None
2023-04-28 15:52:13,035 DEBUG [casanovo/MainProcess] casanovo.main : max_epochs = 30
2023-04-28 15:52:13,035 DEBUG [casanovo/MainProcess] casanovo.main : num_sanity_val_steps = 0
2023-04-28 15:52:13,036 DEBUG [casanovo/MainProcess] casanovo.main : train_from_scratch = True
2023-04-28 15:52:13,040 DEBUG [casanovo/MainProcess] casanovo.main : save_model = True
2023-04-28 15:52:13,040 DEBUG [casanovo/MainProcess] casanovo.main : model_save_folder_path =
2023-04-28 15:52:13,041 DEBUG [casanovo/MainProcess] casanovo.main : save_weights_only = True
2023-04-28 15:52:13,041 DEBUG [casanovo/MainProcess] casanovo.main : every_n_train_steps = 50000
2023-04-28 15:52:13,041 DEBUG [casanovo/MainProcess] casanovo.main : no_gpu = False
2023-04-28 15:52:13,042 DEBUG [casanovo/MainProcess] casanovo.main : n_workers = 0
2023-04-28 15:52:13,042 INFO [casanovo/MainProcess] casanovo.main : Predict peptide sequences with Casanovo.
2023-04-28 15:52:13,200 DEBUG [fsspec.local/MainProcess] local.__init__ : open file: C:/Users/dekke102/AppData/Local/casanovo/casanovo_massivekb_v3_0_0.ckpt
2023-04-28 15:52:13,524 INFO [depthcharge.data.hdf5/MainProcess] hdf5.__init__ : Reading 1 files...
C:\MyPrograms\MSConvert_output_temp\sample_preprocessed_spectra.mgf: 128spectra [00:00, 4124.32spectra/s]
2023-04-28 15:52:13,625 WARNING [py.warnings/MainProcess] warnings._showwarnmsg : C:\MyPrograms\Anaconda\envs\casanovo_env\lib\site-packages\pytorch_lightning\trainer\connectors\accelerator_connector.py:589: LightningDeprecationWarning: The Trainer argument `auto_select_gpus` has been deprecated in v1.9.0 and will be removed in v2.0.0. Please use the function `pytorch_lightning.accelerators.find_usable_cuda_devices` instead.
  rank_zero_deprecation(

2023-04-28 15:52:13,885 WARNING [py.warnings/MainProcess] warnings._showwarnmsg : C:\MyPrograms\Anaconda\envs\casanovo_env\lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:224: PossibleUserWarning: The dataloader, predict_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 8 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(

Predicting DataLoader 0:   0%|                                                                   | 0/1 [00:00<?, ?it/s]2023-04-28 15:52:27,969 WARNING [py.warnings/MainProcess] warnings._showwarnmsg : C:\MyPrograms\Anaconda\envs\casanovo_env\lib\site-packages\torch\nn\modules\transformer.py:287: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at ..\aten\src\ATen\NestedTensorImpl.cpp:179.)
  output = torch._nested_tensor_from_mask(output, src_key_padding_mask.logical_not(), mask_check=False)

2023-04-28 15:54:05,585 WARNING [py.warnings/MainProcess] warnings._showwarnmsg : C:\MyPrograms\Anaconda\envs\casanovo_env\lib\site-packages\torch\nn\functional.py:4999: UserWarning: Support for mismatched key_padding_mask and attn_mask is deprecated. Use same type for both instead.
  warnings.warn(

2023-04-28 15:54:05,706 WARNING [py.warnings/MainProcess] warnings._showwarnmsg : C:\MyPrograms\Anaconda\envs\casanovo_env\lib\site-packages\torch\nn\modules\activation.py:1144: UserWarning: Converting mask without torch.bool dtype to bool; this will negatively affect performance. Prefer to use a boolean mask directly. (Triggered internally at ..\aten\src\ATen\native\transformers\attention.cpp:152.)
  return torch._native_multi_head_attention(

Predicting DataLoader 0: 100%|█████████████████████████████████████████████████████████| 1/1 [30:20<00:00, 1820.87s/it]
Traceback (most recent call last):
  File "C:\MyPrograms\Anaconda\envs\casanovo_env\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\MyPrograms\Anaconda\envs\casanovo_env\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\MyPrograms\Anaconda\envs\casanovo_env\Scripts\casanovo.exe\__main__.py", line 7, in <module>
  File "C:\MyPrograms\Anaconda\envs\casanovo_env\lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "C:\MyPrograms\Anaconda\envs\casanovo_env\lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "C:\MyPrograms\Anaconda\envs\casanovo_env\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\MyPrograms\Anaconda\envs\casanovo_env\lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "C:\MyPrograms\Anaconda\envs\casanovo_env\lib\site-packages\casanovo\casanovo.py", line 166, in main
    writer.save()
  File "C:\MyPrograms\Anaconda\envs\casanovo_env\lib\site-packages\casanovo\data\ms_io.py", line 203, in save
    f"ms_run[{self._run_map[psm[1][0]]}]:{psm[1][1]}",
KeyError: 'C:\\MyPrograms\\MSConvert_output_temp\\sample_preprocessed_spectra.mgf'
bittremieux commented 1 year ago

Thank you for providing the full output. I think that this is a known bug that has recently been fixed (#168), but is not part of a new Casanovo release yet.

The problem more than likely is caused due to running Casanovo with a full file path, while the code was internally only using file names. Can you try to run it again from the directory where the MGF file is located and using -peak_path=sample_preprocessed_spectra.mgf and see whether that works?

PMDekker commented 1 year ago

Thanks, that was indeed the issue. It works now!