Noble-Lab / casanovo

De Novo Mass Spectrometry Peptide Sequencing with a Transformer Model
https://casanovo.readthedocs.io
Apache License 2.0
102 stars 37 forks source link

Using HDF5 file as train/val dataset leads to index out of bound error #311

Closed cfmelend closed 6 months ago

cfmelend commented 6 months ago

I'm attempting to train Casanovo using the latest dev branch with preprocessed hdf5 files generated by depthcharge, however when using these as inputs I appear to run into an indexing error during either training/validation when the dataloaders attempt to access the hdf5 to retrieve a new batch. This is similar to an error I've observed while manually creating depthcharge SpectrumIndex objects during my own experiments without removing a previously written hdf5 file (the same data appears to be written twice). I've attached the two hdf5 files along with the config file I'm using to this post. test_data.val.hdf5.txt test_data.train.hdf5.txt test.yaml.txt

Epoch 0: 3%|▎ | 1/32 [00:27<14:18, 0.04it/s, v_num=1]Traceback (most recent call last): File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/bin/casanovo", line 8, in sys.exit(main()) ^^^^^^ File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/rich_click/rich_command.py", line 126, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/click/core.py", line 783, in invoke return __callback(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/net/noble/vol1/home/cfmelend/proj/casanovo/casanovo/casanovo.py", line 221, in train runner.train(train_peak_path, validation_peak_path) File "/net/noble/vol1/home/cfmelend/proj/casanovo/casanovo/denovo/model_runner.py", line 108, in train self.trainer.fit( File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 543, in fit call._call_and_handle_interrupt( File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt return trainer_fn(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 579, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 986, in _run results = self._run_stage() ^^^^^^^^^^^^^^^^^ File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 1032, in _run_stage self.fit_loop.run() File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/lightning/pytorch/loops/fit_loop.py", line 205, in run self.advance() File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/lightning/pytorch/loops/fit_loop.py", line 363, in advance self.epoch_loop.run(self._data_fetcher) File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 138, in run self.advance(data_fetcher) File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/lightning/pytorch/loops/training_epochloop.py", line 204, in advance batch, , = next(data_fetcher) ^^^^^^^^^^^^^^^^^^ File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/lightning/pytorch/loops/fetchers.py", line 133, in next batch = super().next() ^^^^^^^^^^^^^^^^^^ File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/lightning/pytorch/loops/fetchers.py", line 60, in next batch = next(self.iterator) ^^^^^^^^^^^^^^^^^^^ File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/lightning/pytorch/utilities/combined_loader.py", line 341, in next out = next(self._iterator) ^^^^^^^^^^^^^^^^^^^^ File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/lightning/pytorch/utilities/combined_loader.py", line 78, in next__ out[i] = next(self.iterators[i]) ^^^^^^^^^^^^^^^^^^^^^^^ File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 631, in next data = self._next_data() ^^^^^^^^^^^^^^^^^ File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data return self._process_data(data) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data data.reraise() File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/torch/_utils.py", line 722, in reraise raise exception IndexError: Caught IndexError in DataLoader worker process 1. Original Traceback (most recent call last): File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) ^^^^^^^^^^^^^^^^^^^^ File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index]


File "/net/noble/vol1/home/cfmelend/proj/casanovo/casanovo/data/datasets.py", line 263, in __getitem__
) = self.index[idx]
~~~~~~~~~~^^^^^
File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/depthcharge/data/hdf5.py", line 335, in __getitem__
return self.get_spectrum(idx)
^^^^^^^^^^^^^^^^^^^^^^
File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/depthcharge/data/hdf5.py", line 480, in get_spectrum
spec_info = super().get_spectrum(idx)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_main/lib/python3.11/site-packages/depthcharge/data/hdf5.py", line 272, in get_spectrum
start_offset = offsets[0]
~~~~~~~^^^
IndexError: index 0 is out of bounds for axis 0 with size 0
bittremieux commented 6 months ago

I don't think that this is a Casanovo/DepthCharge bug, but rather that for some reason your HDF5 files are corrupted.

Until we add proper support for re-using spectrum indexes, rather than the hacky quickly copying it over, doing this is at the user's own risk.

wsnoble commented 6 months ago

@bittremieux any advice on how Carlo can check whether his hdf5 file is corrupted?

tuanle618 commented 4 months ago

I also faced the issue when creating a new hdf5 index using depthcharge v0.2.3 for the non-enzymatic dataset (https://doi.org/doi:10.25345/C5KS6JG0W) (https://casanovo.readthedocs.io/en/latest/faq.html#training-casanovo) How can I fix this @bittremieux, @wsnoble ?

bittremieux commented 4 months ago

Can you provide some more details, including your log file, system information, etc.?

This bug occurred only when Casanovo was used in a non-standard way, so we need more details to understand what's happening.

tuanle618 commented 4 months ago

HI @bittremieux , I am using the data downloaded from https://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=73b8074734104372bc5a441e8dc4447e

and a fork from the casanovo repository here https://github.com/tuanle618/casanovo

libraries are the following:

_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                  2_kmp_llvm    conda-forge
absl-py                   2.1.0                    pypi_0    pypi
aiohttp                   3.9.5                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
alsa-lib                  1.2.11               hd590300_1    conda-forge
anyio                     4.3.0              pyhd8ed1ab_0    conda-forge
appdirs                   1.4.4                    pypi_0    pypi
argon2-cffi               23.1.0             pyhd8ed1ab_0    conda-forge
argon2-cffi-bindings      21.2.0          py310h2372a71_4    conda-forge
arrow                     1.3.0              pyhd8ed1ab_0    conda-forge
asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
async-lru                 2.0.4              pyhd8ed1ab_0    conda-forge
async-timeout             4.0.3                    pypi_0    pypi
attr                      2.5.1                h166bdaf_1    conda-forge
attrs                     23.2.0             pyh71513ae_0    conda-forge
babel                     2.14.0             pyhd8ed1ab_0    conda-forge
beautifulsoup4            4.12.3             pyha770c72_0    conda-forge
blas                      2.116                       mkl    conda-forge
blas-devel                3.9.0            16_linux64_mkl    conda-forge
bleach                    6.1.0              pyhd8ed1ab_0    conda-forge
blessed                   1.20.0                   pypi_0    pypi
brotli                    1.1.0                hd590300_1    conda-forge
brotli-bin                1.1.0                hd590300_1    conda-forge
brotli-python             1.1.0           py310hc6cd4ac_1    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
ca-certificates           2024.2.2             hbcca054_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cairo                     1.18.0               h3faef2a_0    conda-forge
casanovo                  0.1.dev330+g42882f8.d20240522          pypi_0    pypi
certifi                   2024.2.2           pyhd8ed1ab_0    conda-forge
cffi                      1.16.0          py310h2fee648_0    conda-forge
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
click                     8.1.7                    pypi_0    pypi
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
comm                      0.2.2              pyhd8ed1ab_0    conda-forge
contourpy                 1.2.1           py310hd41b1e2_0    conda-forge
cryptography              42.0.7                   pypi_0    pypi
cuda-cudart               11.8.89                       0    nvidia
cuda-cupti                11.8.87                       0    nvidia
cuda-libraries            11.8.0                        0    nvidia
cuda-nvrtc                11.8.89                       0    nvidia
cuda-nvtx                 11.8.86                       0    nvidia
cuda-runtime              11.8.0                        0    nvidia
cycler                    0.12.1             pyhd8ed1ab_0    conda-forge
dbus                      1.13.6               h5008d03_3    conda-forge
debugpy                   1.8.1           py310hc6cd4ac_0    conda-forge
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
deprecated                1.2.14                   pypi_0    pypi
depthcharge-ms            0.2.3                    pypi_0    pypi
dftd4                     3.6.0                hf49bc11_0    conda-forge
einops                    0.8.0                    pypi_0    pypi
entrypoints               0.4                pyhd8ed1ab_0    conda-forge
exceptiongroup            1.2.0              pyhd8ed1ab_2    conda-forge
executing                 2.0.1              pyhd8ed1ab_0    conda-forge
expat                     2.6.2                h59595ed_0    conda-forge
fastobo                   0.12.3                   pypi_0    pypi
filelock                  3.14.0             pyhd8ed1ab_0    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 h77eed37_2    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.51.0          py310h2372a71_0    conda-forge
fqdn                      1.5.1              pyhd8ed1ab_0    conda-forge
freetype                  2.12.1               h267a509_2    conda-forge
frozenlist                1.4.1                    pypi_0    pypi
fsspec                    2024.5.0           pyhff2d567_0    conda-forge
gettext                   0.22.5               h59595ed_2    conda-forge
gettext-tools             0.22.5               h59595ed_2    conda-forge
glib                      2.80.2               hf974151_0    conda-forge
glib-tools                2.80.2               hb6ce0ca_0    conda-forge
gmp                       6.3.0                h59595ed_1    conda-forge
gmpy2                     2.1.5           py310hc7909c9_1    conda-forge
gpustat                   1.1.1                    pypi_0    pypi
graphite2                 1.3.13            h59595ed_1003    conda-forge
grpcio                    1.64.0                   pypi_0    pypi
gst-plugins-base          1.24.3               h9ad1361_0    conda-forge
gstreamer                 1.24.3               haf2f30d_0    conda-forge
h11                       0.14.0             pyhd8ed1ab_0    conda-forge
h2                        4.1.0              pyhd8ed1ab_0    conda-forge
h5py                      3.11.0                   pypi_0    pypi
harfbuzz                  8.5.0                hfac3d4d_0    conda-forge
hpack                     4.0.0              pyh9f0ad1d_0    conda-forge
httpcore                  1.0.5              pyhd8ed1ab_0    conda-forge
httpx                     0.27.0             pyhd8ed1ab_0    conda-forge
hyperframe                6.0.1              pyhd8ed1ab_0    conda-forge
icu                       73.2                 h59595ed_0    conda-forge
idna                      3.7                pyhd8ed1ab_0    conda-forge
importlib-metadata        7.1.0              pyha770c72_0    conda-forge
importlib_metadata        7.1.0                hd8ed1ab_0    conda-forge
importlib_resources       6.4.0              pyhd8ed1ab_0    conda-forge
ipykernel                 6.29.3             pyhd33586a_0    conda-forge
ipython                   8.24.0             pyh707e725_0    conda-forge
ipywidgets                8.1.2              pyhd8ed1ab_1    conda-forge
isoduration               20.11.0            pyhd8ed1ab_0    conda-forge
jedi                      0.19.1             pyhd8ed1ab_0    conda-forge
jinja2                    3.1.4              pyhd8ed1ab_0    conda-forge
joblib                    1.4.2                    pypi_0    pypi
json5                     0.9.25             pyhd8ed1ab_0    conda-forge
jsonpointer               2.4             py310hff52083_3    conda-forge
jsonschema                4.22.0             pyhd8ed1ab_0    conda-forge
jsonschema-specifications 2023.12.1          pyhd8ed1ab_0    conda-forge
jsonschema-with-format-nongpl 4.22.0             pyhd8ed1ab_0    conda-forge
jupyter                   1.0.0             pyhd8ed1ab_10    conda-forge
jupyter-lsp               2.2.5              pyhd8ed1ab_0    conda-forge
jupyter_client            8.6.1              pyhd8ed1ab_0    conda-forge
jupyter_console           6.6.3              pyhd8ed1ab_0    conda-forge
jupyter_core              5.7.2           py310hff52083_0    conda-forge
jupyter_events            0.10.0             pyhd8ed1ab_0    conda-forge
jupyter_server            2.14.0             pyhd8ed1ab_0    conda-forge
jupyter_server_terminals  0.5.3              pyhd8ed1ab_0    conda-forge
jupyterlab                4.2.0              pyhd8ed1ab_1    conda-forge
jupyterlab_pygments       0.3.0              pyhd8ed1ab_1    conda-forge
jupyterlab_server         2.27.1             pyhd8ed1ab_0    conda-forge
jupyterlab_widgets        3.0.10             pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.5           py310hd41b1e2_1    conda-forge
krb5                      1.21.2               h659d440_0    conda-forge
lame                      3.100             h166bdaf_1003    conda-forge
lark                      1.1.9                    pypi_0    pypi
lcms2                     2.16                 hb7c19ff_0    conda-forge
ld_impl_linux-64          2.40                 h55db66e_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libasprintf               0.22.5               h661eb56_2    conda-forge
libasprintf-devel         0.22.5               h661eb56_2    conda-forge
libblas                   3.9.0            16_linux64_mkl    conda-forge
libbrotlicommon           1.1.0                hd590300_1    conda-forge
libbrotlidec              1.1.0                hd590300_1    conda-forge
libbrotlienc              1.1.0                hd590300_1    conda-forge
libcap                    2.69                 h0f662aa_0    conda-forge
libcblas                  3.9.0            16_linux64_mkl    conda-forge
libclang-cpp15            15.0.7          default_h127d8a8_5    conda-forge
libclang13                18.1.5          default_h5d6823c_0    conda-forge
libcublas                 11.11.3.6                     0    nvidia
libcufft                  10.9.0.58                     0    nvidia
libcufile                 1.9.1.3                       0    nvidia
libcups                   2.3.3                h4637d8d_4    conda-forge
libcurand                 10.3.5.147                    0    nvidia
libcusolver               11.4.1.48                     0    nvidia
libcusparse               11.7.5.86                     0    nvidia
libdeflate                1.20                 hd590300_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libevent                  2.1.12               hf998b51_1    conda-forge
libexpat                  2.6.2                h59595ed_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libflac                   1.4.3                h59595ed_0    conda-forge
libgcc-ng                 13.2.0               h77fa898_7    conda-forge
libgcrypt                 1.10.3               hd590300_0    conda-forge
libgettextpo              0.22.5               h59595ed_2    conda-forge
libgettextpo-devel        0.22.5               h59595ed_2    conda-forge
libgfortran-ng            13.2.0               h69a702a_7    conda-forge
libgfortran5              13.2.0               hca663fb_7    conda-forge
libglib                   2.80.2               hf974151_0    conda-forge
libgpg-error              1.49                 h4f305b6_0    conda-forge
libhwloc                  2.10.0          default_h5622ce7_1001    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
liblapack                 3.9.0            16_linux64_mkl    conda-forge
liblapacke                3.9.0            16_linux64_mkl    conda-forge
libllvm15                 15.0.7               hb3ce162_4    conda-forge
libllvm18                 18.1.5               hb77312f_0    conda-forge
libnpp                    11.8.0.86                     0    nvidia
libnsl                    2.0.1                hd590300_0    conda-forge
libnvjpeg                 11.9.0.86                     0    nvidia
libogg                    1.3.4                h7f98852_1    conda-forge
libopus                   1.3.1                h7f98852_1    conda-forge
libpng                    1.6.43               h2797004_0    conda-forge
libpq                     16.3                 ha72fbe1_0    conda-forge
libsndfile                1.2.2                hc60ed4a_1    conda-forge
libsodium                 1.0.18               h36c2ea0_1    conda-forge
libsqlite                 3.45.3               h2797004_0    conda-forge
libstdcxx-ng              13.2.0               hc0a3c3a_7    conda-forge
libsystemd0               255                  h3516f8a_1    conda-forge
libtiff                   4.6.0                h1dd3fc0_3    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libvorbis                 1.3.7                h9c3ff4c_0    conda-forge
libwebp-base              1.4.0                hd590300_0    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxkbcommon              1.7.0                h662e7e4_0    conda-forge
libxml2                   2.12.7               hc051c1a_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
lightning                 2.2.4                    pypi_0    pypi
lightning-utilities       0.11.2             pyhd8ed1ab_0    conda-forge
llvm-openmp               15.0.7               h0cdce71_0    conda-forge
llvmlite                  0.42.0                   pypi_0    pypi
lxml                      5.2.2                    pypi_0    pypi
lz4-c                     1.9.4                hcb278e6_0    conda-forge
markdown                  3.6                      pypi_0    pypi
markdown-it-py            3.0.0                    pypi_0    pypi
markupsafe                2.1.5           py310h2372a71_0    conda-forge
matplotlib                3.8.4           py310hff52083_2    conda-forge
matplotlib-base           3.8.4           py310hef631a5_2    conda-forge
matplotlib-inline         0.1.7              pyhd8ed1ab_0    conda-forge
mctc-lib                  0.3.1                h74f4db8_0    conda-forge
mdurl                     0.1.2                    pypi_0    pypi
mistune                   3.0.2              pyhd8ed1ab_0    conda-forge
mkl                       2022.1.0           h84fe81f_915    conda-forge
mkl-devel                 2022.1.0           ha770c72_916    conda-forge
mkl-include               2022.1.0           h84fe81f_915    conda-forge
mpc                       1.3.1                hfe3b2da_0    conda-forge
mpfr                      4.2.1                h9458935_1    conda-forge
mpg123                    1.32.6               h59595ed_0    conda-forge
mpmath                    1.3.0              pyhd8ed1ab_0    conda-forge
multidict                 6.0.5                    pypi_0    pypi
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
mysql-common              8.3.0                hf1915f5_4    conda-forge
mysql-libs                8.3.0                hca2cd23_4    conda-forge
natsort                   8.4.0                    pypi_0    pypi
nbclient                  0.10.0             pyhd8ed1ab_0    conda-forge
nbconvert                 7.16.4               hd8ed1ab_0    conda-forge
nbconvert-core            7.16.4             pyhd8ed1ab_0    conda-forge
nbconvert-pandoc          7.16.4               hd8ed1ab_0    conda-forge
nbformat                  5.10.4             pyhd8ed1ab_0    conda-forge
ncurses                   6.5                  h59595ed_0    conda-forge
nest-asyncio              1.6.0              pyhd8ed1ab_0    conda-forge
networkx                  3.3                pyhd8ed1ab_1    conda-forge
notebook                  7.2.0              pyhd8ed1ab_0    conda-forge
notebook-shim             0.2.4              pyhd8ed1ab_0    conda-forge
nspr                      4.35                 h27087fc_0    conda-forge
nss                       3.100                hca3bf56_0    conda-forge
numba                     0.59.1                   pypi_0    pypi
numpy                     1.26.4          py310hb13e2d6_0    conda-forge
nvidia-ml-py              12.550.52                pypi_0    pypi
openjpeg                  2.5.2                h488ebb8_0    conda-forge
openssl                   3.3.0                h4ab18f5_2    conda-forge
overrides                 7.7.0              pyhd8ed1ab_0    conda-forge
packaging                 24.0               pyhd8ed1ab_0    conda-forge
pandas                    2.2.2           py310hf9f9076_1    conda-forge
pandoc                    3.2                  ha770c72_0    conda-forge
pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
parso                     0.8.4              pyhd8ed1ab_0    conda-forge
patsy                     0.5.6              pyhd8ed1ab_0    conda-forge
pcre2                     10.43                hcad00b1_0    conda-forge
pexpect                   4.9.0              pyhd8ed1ab_0    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    10.3.0          py310hf73ecf8_0    conda-forge
pip                       24.0               pyhd8ed1ab_0    conda-forge
pixman                    0.43.2               h59595ed_0    conda-forge
pkgutil-resolve-name      1.3.10             pyhd8ed1ab_1    conda-forge
platformdirs              4.2.2              pyhd8ed1ab_0    conda-forge
ply                       3.11               pyhd8ed1ab_2    conda-forge
prometheus_client         0.20.0             pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.42             pyha770c72_0    conda-forge
prompt_toolkit            3.0.42               hd8ed1ab_0    conda-forge
protobuf                  5.26.1                   pypi_0    pypi
psutil                    5.9.8           py310h2372a71_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pulseaudio-client         17.0                 hb77b528_0    conda-forge
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
pycparser                 2.22               pyhd8ed1ab_0    conda-forge
pygithub                  2.3.0                    pypi_0    pypi
pygments                  2.18.0             pyhd8ed1ab_0    conda-forge
pyjwt                     2.8.0                    pypi_0    pypi
pynacl                    1.5.0                    pypi_0    pypi
pyparsing                 3.1.2              pyhd8ed1ab_0    conda-forge
pyqt                      5.15.9          py310h04931ad_5    conda-forge
pyqt5-sip                 12.12.2         py310hc6cd4ac_5    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
pyteomics                 4.7.2                    pypi_0    pypi
python                    3.10.14         hd12c33a_0_cpython    conda-forge
python-dateutil           2.9.0              pyhd8ed1ab_0    conda-forge
python-fastjsonschema     2.19.1             pyhd8ed1ab_0    conda-forge
python-json-logger        2.0.7              pyhd8ed1ab_0    conda-forge
python-tzdata             2024.1             pyhd8ed1ab_0    conda-forge
python_abi                3.10                    4_cp310    conda-forge
pytorch                   2.3.0           py3.10_cuda11.8_cudnn8.7.0_0    pytorch
pytorch-cuda              11.8                 h7e8668a_5    pytorch
pytorch-lightning         2.2.0              pyhd8ed1ab_0    conda-forge
pytorch-mutex             1.0                        cuda    pytorch
pytz                      2024.1             pyhd8ed1ab_0    conda-forge
pyyaml                    6.0.1           py310h2372a71_1    conda-forge
pyzmq                     26.0.3          py310h6883aea_0    conda-forge
qt-main                   5.15.8              hc9dc06e_21    conda-forge
qtconsole-base            5.5.2              pyha770c72_0    conda-forge
qtpy                      2.4.1              pyhd8ed1ab_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
referencing               0.35.1             pyhd8ed1ab_0    conda-forge
requests                  2.32.1             pyhd8ed1ab_0    conda-forge
rfc3339-validator         0.1.4              pyhd8ed1ab_0    conda-forge
rfc3986-validator         0.1.1              pyh9f0ad1d_0    conda-forge
rich                      13.7.1                   pypi_0    pypi
rich-click                1.8.2                    pypi_0    pypi
rmsd                      1.5.1                    pypi_0    pypi
rpds-py                   0.18.1          py310he421c4c_0    conda-forge
scikit-learn              1.4.2                    pypi_0    pypi
scipy                     1.13.0          py310h93e2701_1    conda-forge
seaborn                   0.13.2               hd8ed1ab_2    conda-forge
seaborn-base              0.13.2             pyhd8ed1ab_2    conda-forge
send2trash                1.8.3              pyh0d859eb_0    conda-forge
setuptools                69.5.1             pyhd8ed1ab_0    conda-forge
simple-dftd3              1.0.0                hd59d2e7_0    conda-forge
sip                       6.7.12          py310hc6cd4ac_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sniffio                   1.3.1              pyhd8ed1ab_0    conda-forge
soupsieve                 2.5                pyhd8ed1ab_1    conda-forge
spectrum-utils            0.4.2                    pypi_0    pypi
stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
statsmodels               0.14.2          py310h261611a_0    conda-forge
sympy                     1.12            pypyh9d50eac_103    conda-forge
tbb                       2021.12.0            h297d8ca_1    conda-forge
tblite                    0.3.0                hf49bc11_1    conda-forge
tensorboard               2.16.2                   pypi_0    pypi
tensorboard-data-server   0.7.2                    pypi_0    pypi
terminado                 0.18.1             pyh0d859eb_0    conda-forge
threadpoolctl             3.5.0                    pypi_0    pypi
tinycss2                  1.3.0              pyhd8ed1ab_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
toml-f                    0.4.2                hd8f1df9_0    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
torch-ema                 0.3                      pypi_0    pypi
torchmetrics              1.4.0.post0        pyhd8ed1ab_0    conda-forge
torchtriton               2.3.0                     py310    pytorch
tornado                   6.4             py310h2372a71_0    conda-forge
tqdm                      4.66.4             pyhd8ed1ab_0    conda-forge
traitlets                 5.14.3             pyhd8ed1ab_0    conda-forge
types-python-dateutil     2.9.0.20240316     pyhd8ed1ab_0    conda-forge
typing-extensions         4.11.0               hd8ed1ab_0    conda-forge
typing_extensions         4.11.0             pyha770c72_0    conda-forge
typing_utils              0.1.0              pyhd8ed1ab_0    conda-forge
tzdata                    2024a                h0c530f3_0    conda-forge
unicodedata2              15.1.0          py310h2372a71_0    conda-forge
uri-template              1.3.0              pyhd8ed1ab_0    conda-forge
urllib3                   2.2.1              pyhd8ed1ab_0    conda-forge
wcwidth                   0.2.13             pyhd8ed1ab_0    conda-forge
webcolors                 1.13               pyhd8ed1ab_0    conda-forge
webencodings              0.5.1              pyhd8ed1ab_2    conda-forge
websocket-client          1.8.0              pyhd8ed1ab_0    conda-forge
werkzeug                  3.0.3                    pypi_0    pypi
wheel                     0.43.0             pyhd8ed1ab_1    conda-forge
widgetsnbextension        4.0.10             pyhd8ed1ab_0    conda-forge
wrapt                     1.16.0                   pypi_0    pypi
xcb-util                  0.4.0                hd590300_1    conda-forge
xcb-util-image            0.4.0                h8ee46fc_1    conda-forge
xcb-util-keysyms          0.4.0                h8ee46fc_1    conda-forge
xcb-util-renderutil       0.3.9                hd590300_1    conda-forge
xcb-util-wm               0.4.1                h8ee46fc_1    conda-forge
xkeyboard-config          2.41                 hd590300_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.1.1                hd590300_0    conda-forge
xorg-libsm                1.2.4                h7391055_0    conda-forge
xorg-libx11               1.8.9                h8ee46fc_0    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxrender           0.9.11               hd590300_0    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xf86vidmodeproto     2.3.1             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xtb                       6.6.1                hf49bc11_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
yarl                      1.9.4                    pypi_0    pypi
zeromq                    4.3.5                h75354e8_4    conda-forge
zipp                      3.17.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge

when running the command casanovo train path/to/trainpeaks.hdf5 -p path/to/valpeaks.hdf5 --config casanovo/casanovo/config.yaml

I get the following log file. Log file:

2024-05-22 20:33:58,405 INFO [casanovo/MainProcess] casanovo.setup_model : Casanovo version 0.1.dev330+g42882f8.d20240522
2024-05-22 20:33:58,405 DEBUG [casanovo/MainProcess] casanovo.setup_model : model = None
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : config = casanovo/casanovo/config.yaml
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : output = somepath/casanovo_20240522203358
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : precursor_mass_tol = 50.0
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : isotope_error_range = (0, 1)
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : min_peptide_len = 6
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : predict_batch_size = 1024
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : n_beams = 1
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : top_match = 1
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : accelerator = auto
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : devices = None
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : random_seed = 454
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : n_log = 1
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : tb_summarywriter = None
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : save_top_k = 5
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : model_save_folder_path =
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : val_check_interval = 50000
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : n_peaks = 150
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : min_mz = 50.0
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : max_mz = 2500.0
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : min_intensity = 0.01
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : remove_precursor_tol = 2.0
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : max_charge = 10
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : dim_model = 64
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : n_head = 8
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : dim_feedforward = 64
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : n_layers = 5
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : dropout = 0.0
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : dim_intensity = None
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : max_length = 100
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : warmup_iters = 100000
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : cosine_schedule_period_iters = 600000
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : learning_rate = 0.0005
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : weight_decay = 1e-05
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : train_label_smoothing = 0.01
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : train_batch_size = 128
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : max_epochs = 5
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : num_sanity_val_steps = 0
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : calculate_precision = False
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : residues = {'G': 57.021464, 'A': 71.037114, 'S': 87.032028, 'P': 97.052764, 'V': 99.068414, 'T': 101.04767, 'C+57.021': 160.030649, 'L': 113.084064, 'I': 113.084064, 'N': 114.042927, 'D': 115.026943, 'Q': 128.058578, 'K': 128.094963, 'E': 129.042593, 'M': 131.040485, 'H': 137.058912, 'F': 147.068414, 'R': 156.101111, 'Y': 163.063329, 'W': 186.079313, 'M+15.995': 147.0354, 'N+0.984': 115.026943, 'Q+0.984': 129.042594, '+42.011': 42.010565, '+43.006': 43.005814, '-17.027': -17.026549, '+43.006-17.027': 25.980265, 'C': 160.030649, 'M[15.99]': 147.0354, 'N[0.98]': 115.026943, 'Q[0.98]': 129.042594}
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : n_workers = 10
2024-05-22 20:33:58,408 INFO [casanovo/MainProcess] casanovo.train : Training a model from:
2024-05-22 20:33:58,408 INFO [casanovo/MainProcess] casanovo.train :   somepath/trainpeaks.hdf5
2024-05-22 20:33:58,408 INFO [casanovo/MainProcess] casanovo.train : Using the following validation files:
2024-05-22 20:33:58,408 INFO [casanovo/MainProcess] casanovo.train :   somepath/valpeaks.hdf5
2024-05-22 20:34:00,489 WARNING [py.warnings/MainProcess] warnings._showwarnmsg : somepath/lib/python3.10/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:653: Checkpoint directory  exists and is not empty.

2024-05-22 20:34:00,489 WARNING [py.warnings/MainProcess] warnings._showwarnmsg : somepath/lib/python3.10/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:653: Checkpoint directory  exists and is not empty.

I ran on linux centos rhel fedora. But also on my Macbook M2 I get the error. Could it be the dataset? Even if I just initialise the dataloader and iterate over it, I get the error.

  File "somepath/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
  File "somepath/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "somepath/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "somepath/lib/python3.10/site-packages/casanovo/data/datasets.py", line 263, in __getitem__
    ) = self.index[idx]
  File "somepath/lib/python3.10/site-packages/depthcharge/data/hdf5.py", line 335, in __getitem__
    return self.get_spectrum(idx)
  File "somepath//lib/python3.10/site-packages/depthcharge/data/hdf5.py", line 480, in get_spectrum
    spec_info = super().get_spectrum(idx)
  File "somepath/lib/python3.10/site-packages/depthcharge/data/hdf5.py", line 272, in get_spectrum
    start_offset = offsets[0]
IndexError: index 0 is out of bounds for axis 0 with size 0
bittremieux commented 4 months ago

How did you create those trainpeaks.hdf5 and valpeaks.hdf5 files? Because Casanovo doesn't export those for the users by default.

Additionally, the listed Casanovo version is 0.1.dev330+g42882f8.d20240522, which is probably because you installed from the source rather than the official PyPI package. Please use the latest Casanovo version 4.2.0 instead to avoid issues due to an uncontrolled setup.

tuanle618 commented 4 months ago

I've created the trainpeaks.hdf5 and valpeaks.hdf5 files using the AnnotatedSpectrumIndex from depthcharge.data from version 0.2.3 - I think this should be fine since casanovo also creates the index that way? As ms_data_files I pass the path, where all the .mgf files are stored.

EDIT: when executing the casanovo train with the path pointing to the .mgf files for training and validation one training epoch is possible. That seems really weird, since casanovo basically calles the same AnnotatedSpectrumIndex internally here and here . The difference only lies that no .hdf5 is saved and loaded. Could it be a bug when saving the .hdf5 peak files?

bittremieux commented 4 months ago

Unfortunately this non-standard use is very hard for us to debug. As Casanovo works correctly, we will not investigate this further. If you think that there is a specific bug with creating the index, you can open an issue on the DepthCharge repository. However, considering that we haven't encountered this issue with other usages of DepthCharge, in Casanovo and several other projects, I recommend that you first double-check your own code.

tuanle618 commented 4 months ago

I understand. It's fine - I will try debug and see whether the error is when moving to the newest depthcharge version. I just thought it would make sense to save the .hdf5 once and re-use it again instead of processing it every time one wants to run a training.

Thank you!