Closed kostrouc closed 2 years ago
Hmm, somehow the default config file packaged with Casanovo might be corrupted. Can you share this file here: /Users/myname/opt/anaconda3/envs/casanovo_env/lib/python3.8/site-packages/casanovo/config.yaml
?
It won't let me upload .yaml here. Here is the file contents:
(casanovo_env) myname@myip Downloads % cat config.yaml /###
random_seed: 454
n_peaks: 150 min_mz: 50.0 max_mz: 2500.0 min_intensity: 0.01 remove_precursor_tol: 2.0 # Da max_charge: 10 precursor_mass_tol: 50 # ppm isotope_error_range: [0, 1]
dim_model: 512 n_head: 8 dim_feedforward: 1024 n_layers: 9 dropout: 0.0 dim_intensity: custom_encoder: max_length: 100 residues: "G": 57.021464 "A": 71.037114 "S": 87.032028 "P": 97.052764 "V": 99.068414 "T": 101.047670 "C+57.021": 160.030649 # 103.009185 + 57.021464 "L": 113.084064 "I": 113.084064 "N": 114.042927 "D": 115.026943 "Q": 128.058578 "K": 128.094963 "E": 129.042593 "M": 131.040485 "H": 137.058912 "F": 147.068414 "R": 156.101111 "Y": 163.063329 "W": 186.079313
"M+15.995": 147.035400 # Met oxidation: 131.040485 + 15.994915 "N+0.984": 115.026943 # Asn deamidation: 114.042927 + 0.984016 "Q+0.984": 129.042594 # Gln deamidation: 128.058578 + 0.984016
"+42.011": 42.010565 # Acetylation "+43.006": 43.005814 # Carbamylation "-17.027": -17.026549 # NH3 loss "+43.006-17.027": 25.980265 n_log: 1 tb_summarywriter: warmup_iters: 100_000 max_iters: 600_000 learning_rate: 5e-4 weight_decay: 1e-5
train_batch_size: 32 predict_batch_size: 1024
logger: max_epochs: 30 num_sanity_val_steps: 0
train_from_scratch: True
save_model: True model_save_folder_path: "" save_weights_only: True every_n_train_steps: 50_000
Can you put the YAML content in a code block (three backticks) so I can see the exact formatting?
/###
# Casanovo configuration.
# Blank entries are interpreted as "None"
###
# Random seed to ensure reproducible results.
random_seed: 454
# Spectrum processing options.
n_peaks: 150
min_mz: 50.0
max_mz: 2500.0
min_intensity: 0.01
remove_precursor_tol: 2.0 # Da
max_charge: 10
precursor_mass_tol: 50 # ppm
isotope_error_range: [0, 1]
# Model architecture options.
dim_model: 512
n_head: 8
dim_feedforward: 1024
n_layers: 9
dropout: 0.0
dim_intensity:
custom_encoder:
max_length: 100
residues:
"G": 57.021464
"A": 71.037114
"S": 87.032028
"P": 97.052764
"V": 99.068414
"T": 101.047670
"C+57.021": 160.030649 # 103.009185 + 57.021464
"L": 113.084064
"I": 113.084064
"N": 114.042927
"D": 115.026943
"Q": 128.058578
"K": 128.094963
"E": 129.042593
"M": 131.040485
"H": 137.058912
"F": 147.068414
"R": 156.101111
"Y": 163.063329
"W": 186.079313
# Amino acid modifications.
"M+15.995": 147.035400 # Met oxidation: 131.040485 + 15.994915
"N+0.984": 115.026943 # Asn deamidation: 114.042927 + 0.984016
"Q+0.984": 129.042594 # Gln deamidation: 128.058578 + 0.984016
# N-terminal modifications.
"+42.011": 42.010565 # Acetylation
"+43.006": 43.005814 # Carbamylation
"-17.027": -17.026549 # NH3 loss
"+43.006-17.027": 25.980265
n_log: 1
tb_summarywriter:
warmup_iters: 100_000
max_iters: 600_000
learning_rate: 5e-4
weight_decay: 1e-5
# Training/inference options.
train_batch_size: 32
predict_batch_size: 1024
logger:
max_epochs: 30
num_sanity_val_steps: 0
train_from_scratch: True
save_model: True
model_save_folder_path: ""
save_weights_only: True
every_n_train_steps: 50_000
The problem is that starting forward slash /
I think, which renders the file invalid YAML. Did you modify this file yourself? The default config.yaml that's packaged with Casanovo should normally not include the /
.
I copied the config.yaml text from the folder here on GitHub. Not sure why the / was added. When I removed this it now gives an error about not finding a cpu_affinity attribute.
(casanovo_env) katherineostrouchov@myip casanovo % casanovo --mode=denovo --peak_path=sample_preprocessed_spectra.mgf
Traceback (most recent call last):
File "/Users/katherineostrouchov/opt/anaconda3/envs/casanovo_env/bin/casanovo", line 8, in <module>
sys.exit(main())
File "/Users/katherineostrouchov/opt/anaconda3/envs/casanovo_env/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/Users/katherineostrouchov/opt/anaconda3/envs/casanovo_env/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/katherineostrouchov/opt/anaconda3/envs/casanovo_env/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/katherineostrouchov/opt/anaconda3/envs/casanovo_env/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/katherineostrouchov/opt/anaconda3/envs/casanovo_env/lib/python3.8/site-packages/casanovo/casanovo.py", line 166, in main
config["n_workers"] = len(psutil.Process().cpu_affinity())
AttributeError: 'Process' object has no attribute 'cpu_affinity'
This was indeed an issue on MacOS that was recently fixed, but is not in the release on PyPI yet. To get the latest functionality, I recommend installing from GitHub for the time being:
pip uninstall casanovo && pip install git+https://github.com/Noble-Lab/casanovo.git
The script was successful this time. However, a warning was passed regarding num_workers in the DataLoader init from pytorch. I'm not sure where this script is located and how to update it.
(casanovo_env) katherineostrouchov@myip casanovo % export PYTORCH_ENABLE_MPS_FALLBACK=1
(casanovo_env) katherineostrouchov@myip casanovo % casanovo --mode=denovo --peak_path=sample_preprocessed_spectra.mgf
2022-11-03 13:39:43,864 WARNING [py.warnings/MainProcess] warnings._showwarnmsg : /Users/katherineostrouchov/opt/anaconda3/envs/casanovo_env/lib/python3.8/site-packages/pytorch_lightning/utilities/seed.py:48: LightningDeprecationWarning: `pytorch_lightning.utilities.seed.seed_everything` has been deprecated in v1.8.0 and will be removed in v1.10.0. Please use `lightning_lite.utilities.seed.seed_everything` instead.
rank_zero_deprecation(
Global seed set to 454
2022-11-03 13:39:43,869 INFO [casanovo/MainProcess] casanovo._get_model_weights : Model weights file /Users/katherineostrouchov/Library/Caches/casanovo/casanovo_massivekb_v3_0_0.ckpt retrieved from local cache
2022-11-03 13:39:43,870 INFO [casanovo/MainProcess] casanovo.main : Casanovo version 3.0.1.dev4+gf3696ca
2022-11-03 13:39:43,870 DEBUG [casanovo/MainProcess] casanovo.main : mode = denovo
2022-11-03 13:39:43,870 DEBUG [casanovo/MainProcess] casanovo.main : model = /Users/katherineostrouchov/Library/Caches/casanovo/casanovo_massivekb_v3_0_0.ckpt
2022-11-03 13:39:43,870 DEBUG [casanovo/MainProcess] casanovo.main : peak_path = sample_preprocessed_spectra.mgf
2022-11-03 13:39:43,870 DEBUG [casanovo/MainProcess] casanovo.main : peak_path_val = None
2022-11-03 13:39:43,870 DEBUG [casanovo/MainProcess] casanovo.main : config = /Users/katherineostrouchov/opt/anaconda3/envs/casanovo_env/lib/python3.8/site-packages/casanovo/config.yaml
2022-11-03 13:39:43,870 DEBUG [casanovo/MainProcess] casanovo.main : output = /Users/katherineostrouchov/Library/CloudStorage/OneDrive-UniversityofTennessee/IDEXX/casanovo/casanovo_20221103133943
2022-11-03 13:39:43,870 DEBUG [casanovo/MainProcess] casanovo.main : random_seed = 454
2022-11-03 13:39:43,870 DEBUG [casanovo/MainProcess] casanovo.main : n_peaks = 150
2022-11-03 13:39:43,870 DEBUG [casanovo/MainProcess] casanovo.main : min_mz = 50.0
2022-11-03 13:39:43,870 DEBUG [casanovo/MainProcess] casanovo.main : max_mz = 2500.0
2022-11-03 13:39:43,870 DEBUG [casanovo/MainProcess] casanovo.main : min_intensity = 0.01
2022-11-03 13:39:43,870 DEBUG [casanovo/MainProcess] casanovo.main : remove_precursor_tol = 2.0
2022-11-03 13:39:43,870 DEBUG [casanovo/MainProcess] casanovo.main : max_charge = 10
2022-11-03 13:39:43,870 DEBUG [casanovo/MainProcess] casanovo.main : precursor_mass_tol = 50.0
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : isotope_error_range = (0, 1)
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : dim_model = 512
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : n_head = 8
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : dim_feedforward = 1024
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : n_layers = 9
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : dropout = 0.0
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : dim_intensity = None
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : custom_encoder = None
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : max_length = 100
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : residues = {'G': 57.021464, 'A': 71.037114, 'S': 87.032028, 'P': 97.052764, 'V': 99.068414, 'T': 101.04767, 'C+57.021': 160.030649, 'L': 113.084064, 'I': 113.084064, 'N': 114.042927, 'D': 115.026943, 'Q': 128.058578, 'K': 128.094963, 'E': 129.042593, 'M': 131.040485, 'H': 137.058912, 'F': 147.068414, 'R': 156.101111, 'Y': 163.063329, 'W': 186.079313, 'M+15.995': 147.0354, 'N+0.984': 115.026943, 'Q+0.984': 129.042594, '+42.011': 42.010565, '+43.006': 43.005814, '-17.027': -17.026549, '+43.006-17.027': 25.980265}
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : n_log = 1
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : tb_summarywriter = None
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : warmup_iters = 100000
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : max_iters = 600000
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : learning_rate = 0.0005
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : weight_decay = 1e-05
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : train_batch_size = 32
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : predict_batch_size = 1024
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : logger = None
2022-11-03 13:39:43,871 DEBUG [casanovo/MainProcess] casanovo.main : max_epochs = 30
2022-11-03 13:39:43,872 DEBUG [casanovo/MainProcess] casanovo.main : num_sanity_val_steps = 0
2022-11-03 13:39:43,872 DEBUG [casanovo/MainProcess] casanovo.main : train_from_scratch = True
2022-11-03 13:39:43,872 DEBUG [casanovo/MainProcess] casanovo.main : save_model = True
2022-11-03 13:39:43,872 DEBUG [casanovo/MainProcess] casanovo.main : model_save_folder_path =
2022-11-03 13:39:43,872 DEBUG [casanovo/MainProcess] casanovo.main : save_weights_only = True
2022-11-03 13:39:43,872 DEBUG [casanovo/MainProcess] casanovo.main : every_n_train_steps = 50000
2022-11-03 13:39:43,872 DEBUG [casanovo/MainProcess] casanovo.main : n_workers = 0
2022-11-03 13:39:43,872 INFO [casanovo/MainProcess] casanovo.main : Predict peptide sequences with Casanovo.
2022-11-03 13:39:43,996 DEBUG [fsspec.local/MainProcess] local.__init__ : open file: /Users/katherineostrouchov/Library/Caches/casanovo/casanovo_massivekb_v3_0_0.ckpt
2022-11-03 13:39:44,324 INFO [depthcharge.data.hdf5/MainProcess] hdf5.__init__ : Reading 1 files...
sample_preprocessed_spectra.mgf: 128spectra [00:00, 3760.92spectra/s]
2022-11-03 13:39:44,473 WARNING [py.warnings/MainProcess] warnings._showwarnmsg : /Users/katherineostrouchov/opt/anaconda3/envs/casanovo_env/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, predict_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 8 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
Predicting DataLoader 0: 0%| | 0/1 [00:00<?, ?it/s]2022-11-03 13:39:52,672 WARNING [py.warnings/MainProcess] warnings._showwarnmsg : /Users/katherineostrouchov/opt/anaconda3/envs/casanovo_env/lib/python3.8/site-packages/torch/nn/modules/transformer.py:276: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/NestedTensorImpl.cpp:177.)
output = torch._nested_tensor_from_mask(output, src_key_padding_mask.logical_not(), mask_check=False)
Predicting DataLoader 0: 100%|██████████████| 1/1 [06:00<00:00, 360.20s/it]
I'm glad you got it to work!
There are indeed a few warnings, but these can be ignored as they don't influence correct functioning of Casanovo. In particular, on MacOS we are restricted to only using a single thread for the data laoder due to incompatibilities with Apple's M1 chip and multiprocessing. Consequently, Casanovo might run a bit slower, but will still work correctly.
In general, for the most optimal performance, we recommend running on Linux and using a GPU (or multiple).
Hello, I am unable to get the example "casanovo --mode=denovo --peak_path=[PATH_TO]/sample_preprocessed_spectra.mgf" to run successfully. I am not sure what needs to be changed. Any suggestions would be appreciated.
"casanovo --help" returns the correct information. The sample_preprocessed_spectra.mgf and config.yaml files were saved. I'm not sure what else needs to be set up for denovo to function properly.
Thank you
error.txt