PominovaMS / denovo_benchmarks

4 stars 9 forks source link

Unable to run Casanovo non-tryptic model locally #22

Open BioGeek opened 2 weeks ago

BioGeek commented 2 weeks ago

I added a nontryptic column to the dataset_tags.tsv to test if I can correctly parse the dataset tags:

dataset proteome    nontryptic
9_species_human human_uniprotkb_proteome_UP000005640_2024_05_16.fasta   1
9_species_apis_mellifera    apis_mellifera_uniprotkb_proteome_UP000005203_2024_09_11.fasta  0
9_species_saccharomyces_cerevisiae  saccharomyces_cerevisiae_uniprotkb_proteome_UP000002311_2024_09_11.fasta    0
9_species_solanum_lycopersicum  solanum_lycopersicum_uniprotkb_proteome_UP000004994_2024_09_11.fasta    0
9_species_methanosarcina_mazei  methanosarcina_mazei_go1_uniprotkb_proteome_UP000000595_2024_09_11.fasta    0
9_species_bacillus_subtilis bacillus_subtilis_168_uniprotkb_proteome_UP000001570_2024_09_11.fasta   0
9_species_vigna_mungo   vigna_radiata_uniprotkb_taxonomy_id_157791_2024_09_11.fasta 0
9_species_mus_musculus  mus_musculus_uniprotkb_proteome_UP000000589_2024_09_11.fasta    0
human_multiprotease_ptm_trypsin human_uniprotkb_proteome_UP000005640_2024_05_16.fasta   0

However, when I run the becnhmark locally with ./run.sh ./sample_data/9_species_human, I get the following error for Casanovo:

Recalculate all algorithm outputs: false
Processing dataset: 9_species_human (./sample_data/9_species_human)
./sample_data/9_species_human/mgf/151009_exo4_1.mgf
./outputs/9_species_human/casanovo_output.csv
Processing algorithm: casanovo
RUN ALGORITHM
Using non-tryptic model.
Seed set to 454
INFO: Casanovo version 4.2.1
INFO: Sequencing peptides from:
INFO:   9_species_human/151009_exo4_1.mgf
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Traceback (most recent call last):
  File "/usr/local/bin/casanovo", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/rich_click/rich_command.py", line 367, in __call__
    return super().__call__(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/casanovo/casanovo.py", line 143, in sequence
    runner.predict(peak_path, output)
  File "/usr/local/lib/python3.10/site-packages/casanovo/denovo/model_runner.py", line 157, in predict
    self.initialize_model(train=False)
  File "/usr/local/lib/python3.10/site-packages/casanovo/denovo/model_runner.py", line 277, in initialize_model
    self.model = Spec2Pep.load_from_checkpoint(
  File "/usr/local/lib/python3.10/site-packages/lightning/pytorch/utilities/model_helpers.py", line 125, in wrapper
    return self.method(cls, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/lightning/pytorch/core/module.py", line 1582, in load_from_checkpoint
    loaded = _load_from_checkpoint(
  File "/usr/local/lib/python3.10/site-packages/lightning/pytorch/core/saving.py", line 63, in _load_from_checkpoint
    checkpoint = pl_load(checkpoint_path, map_location=map_location)
  File "/usr/local/lib/python3.10/site-packages/lightning/fabric/utilities/cloud_io.py", line 60, in _load
    return torch.load(
  File "/usr/local/lib/python3.10/site-packages/torch/serialization.py", line 1114, in load
    return _legacy_load(
  File "/usr/local/lib/python3.10/site-packages/torch/serialization.py", line 1338, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: could not find MARK
Traceback (most recent call last):
  File "/algo/./output_mapper.py", line 143, in <module>
    output_data = output_mapper.format_output(output_data)
  File "/algo/base/output_mapper.py", line 117, in format_output
    output_data[["sequence", "aa_scores"]] = output_data.apply(
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 4299, in __setitem__
    self._setitem_array(key, value)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 4341, in _setitem_array
    check_key_length(self.columns, key, value)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
    raise ValueError("Columns must be same length as key")