Closed cfmelend closed 6 months ago
I don't think that this is a Casanovo/DepthCharge bug, but rather that for some reason your HDF5 files are corrupted.
Until we add proper support for re-using spectrum indexes, rather than the hacky quickly copying it over, doing this is at the user's own risk.
@bittremieux any advice on how Carlo can check whether his hdf5 file is corrupted?
I also faced the issue when creating a new hdf5 index using depthcharge v0.2.3 for the non-enzymatic dataset (https://doi.org/doi:10.25345/C5KS6JG0W) (https://casanovo.readthedocs.io/en/latest/faq.html#training-casanovo) How can I fix this @bittremieux, @wsnoble ?
Can you provide some more details, including your log file, system information, etc.?
This bug occurred only when Casanovo was used in a non-standard way, so we need more details to understand what's happening.
HI @bittremieux , I am using the data downloaded from https://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=73b8074734104372bc5a441e8dc4447e
and a fork from the casanovo repository here https://github.com/tuanle618/casanovo
libraries are the following:
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_kmp_llvm conda-forge
absl-py 2.1.0 pypi_0 pypi
aiohttp 3.9.5 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
alsa-lib 1.2.11 hd590300_1 conda-forge
anyio 4.3.0 pyhd8ed1ab_0 conda-forge
appdirs 1.4.4 pypi_0 pypi
argon2-cffi 23.1.0 pyhd8ed1ab_0 conda-forge
argon2-cffi-bindings 21.2.0 py310h2372a71_4 conda-forge
arrow 1.3.0 pyhd8ed1ab_0 conda-forge
asttokens 2.4.1 pyhd8ed1ab_0 conda-forge
async-lru 2.0.4 pyhd8ed1ab_0 conda-forge
async-timeout 4.0.3 pypi_0 pypi
attr 2.5.1 h166bdaf_1 conda-forge
attrs 23.2.0 pyh71513ae_0 conda-forge
babel 2.14.0 pyhd8ed1ab_0 conda-forge
beautifulsoup4 4.12.3 pyha770c72_0 conda-forge
blas 2.116 mkl conda-forge
blas-devel 3.9.0 16_linux64_mkl conda-forge
bleach 6.1.0 pyhd8ed1ab_0 conda-forge
blessed 1.20.0 pypi_0 pypi
brotli 1.1.0 hd590300_1 conda-forge
brotli-bin 1.1.0 hd590300_1 conda-forge
brotli-python 1.1.0 py310hc6cd4ac_1 conda-forge
bzip2 1.0.8 hd590300_5 conda-forge
ca-certificates 2024.2.2 hbcca054_0 conda-forge
cached-property 1.5.2 hd8ed1ab_1 conda-forge
cached_property 1.5.2 pyha770c72_1 conda-forge
cairo 1.18.0 h3faef2a_0 conda-forge
casanovo 0.1.dev330+g42882f8.d20240522 pypi_0 pypi
certifi 2024.2.2 pyhd8ed1ab_0 conda-forge
cffi 1.16.0 py310h2fee648_0 conda-forge
charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge
click 8.1.7 pypi_0 pypi
colorama 0.4.6 pyhd8ed1ab_0 conda-forge
comm 0.2.2 pyhd8ed1ab_0 conda-forge
contourpy 1.2.1 py310hd41b1e2_0 conda-forge
cryptography 42.0.7 pypi_0 pypi
cuda-cudart 11.8.89 0 nvidia
cuda-cupti 11.8.87 0 nvidia
cuda-libraries 11.8.0 0 nvidia
cuda-nvrtc 11.8.89 0 nvidia
cuda-nvtx 11.8.86 0 nvidia
cuda-runtime 11.8.0 0 nvidia
cycler 0.12.1 pyhd8ed1ab_0 conda-forge
dbus 1.13.6 h5008d03_3 conda-forge
debugpy 1.8.1 py310hc6cd4ac_0 conda-forge
decorator 5.1.1 pyhd8ed1ab_0 conda-forge
defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge
deprecated 1.2.14 pypi_0 pypi
depthcharge-ms 0.2.3 pypi_0 pypi
dftd4 3.6.0 hf49bc11_0 conda-forge
einops 0.8.0 pypi_0 pypi
entrypoints 0.4 pyhd8ed1ab_0 conda-forge
exceptiongroup 1.2.0 pyhd8ed1ab_2 conda-forge
executing 2.0.1 pyhd8ed1ab_0 conda-forge
expat 2.6.2 h59595ed_0 conda-forge
fastobo 0.12.3 pypi_0 pypi
filelock 3.14.0 pyhd8ed1ab_0 conda-forge
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
font-ttf-ubuntu 0.83 h77eed37_2 conda-forge
fontconfig 2.14.2 h14ed4e7_0 conda-forge
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
fonttools 4.51.0 py310h2372a71_0 conda-forge
fqdn 1.5.1 pyhd8ed1ab_0 conda-forge
freetype 2.12.1 h267a509_2 conda-forge
frozenlist 1.4.1 pypi_0 pypi
fsspec 2024.5.0 pyhff2d567_0 conda-forge
gettext 0.22.5 h59595ed_2 conda-forge
gettext-tools 0.22.5 h59595ed_2 conda-forge
glib 2.80.2 hf974151_0 conda-forge
glib-tools 2.80.2 hb6ce0ca_0 conda-forge
gmp 6.3.0 h59595ed_1 conda-forge
gmpy2 2.1.5 py310hc7909c9_1 conda-forge
gpustat 1.1.1 pypi_0 pypi
graphite2 1.3.13 h59595ed_1003 conda-forge
grpcio 1.64.0 pypi_0 pypi
gst-plugins-base 1.24.3 h9ad1361_0 conda-forge
gstreamer 1.24.3 haf2f30d_0 conda-forge
h11 0.14.0 pyhd8ed1ab_0 conda-forge
h2 4.1.0 pyhd8ed1ab_0 conda-forge
h5py 3.11.0 pypi_0 pypi
harfbuzz 8.5.0 hfac3d4d_0 conda-forge
hpack 4.0.0 pyh9f0ad1d_0 conda-forge
httpcore 1.0.5 pyhd8ed1ab_0 conda-forge
httpx 0.27.0 pyhd8ed1ab_0 conda-forge
hyperframe 6.0.1 pyhd8ed1ab_0 conda-forge
icu 73.2 h59595ed_0 conda-forge
idna 3.7 pyhd8ed1ab_0 conda-forge
importlib-metadata 7.1.0 pyha770c72_0 conda-forge
importlib_metadata 7.1.0 hd8ed1ab_0 conda-forge
importlib_resources 6.4.0 pyhd8ed1ab_0 conda-forge
ipykernel 6.29.3 pyhd33586a_0 conda-forge
ipython 8.24.0 pyh707e725_0 conda-forge
ipywidgets 8.1.2 pyhd8ed1ab_1 conda-forge
isoduration 20.11.0 pyhd8ed1ab_0 conda-forge
jedi 0.19.1 pyhd8ed1ab_0 conda-forge
jinja2 3.1.4 pyhd8ed1ab_0 conda-forge
joblib 1.4.2 pypi_0 pypi
json5 0.9.25 pyhd8ed1ab_0 conda-forge
jsonpointer 2.4 py310hff52083_3 conda-forge
jsonschema 4.22.0 pyhd8ed1ab_0 conda-forge
jsonschema-specifications 2023.12.1 pyhd8ed1ab_0 conda-forge
jsonschema-with-format-nongpl 4.22.0 pyhd8ed1ab_0 conda-forge
jupyter 1.0.0 pyhd8ed1ab_10 conda-forge
jupyter-lsp 2.2.5 pyhd8ed1ab_0 conda-forge
jupyter_client 8.6.1 pyhd8ed1ab_0 conda-forge
jupyter_console 6.6.3 pyhd8ed1ab_0 conda-forge
jupyter_core 5.7.2 py310hff52083_0 conda-forge
jupyter_events 0.10.0 pyhd8ed1ab_0 conda-forge
jupyter_server 2.14.0 pyhd8ed1ab_0 conda-forge
jupyter_server_terminals 0.5.3 pyhd8ed1ab_0 conda-forge
jupyterlab 4.2.0 pyhd8ed1ab_1 conda-forge
jupyterlab_pygments 0.3.0 pyhd8ed1ab_1 conda-forge
jupyterlab_server 2.27.1 pyhd8ed1ab_0 conda-forge
jupyterlab_widgets 3.0.10 pyhd8ed1ab_0 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
kiwisolver 1.4.5 py310hd41b1e2_1 conda-forge
krb5 1.21.2 h659d440_0 conda-forge
lame 3.100 h166bdaf_1003 conda-forge
lark 1.1.9 pypi_0 pypi
lcms2 2.16 hb7c19ff_0 conda-forge
ld_impl_linux-64 2.40 h55db66e_0 conda-forge
lerc 4.0.0 h27087fc_0 conda-forge
libasprintf 0.22.5 h661eb56_2 conda-forge
libasprintf-devel 0.22.5 h661eb56_2 conda-forge
libblas 3.9.0 16_linux64_mkl conda-forge
libbrotlicommon 1.1.0 hd590300_1 conda-forge
libbrotlidec 1.1.0 hd590300_1 conda-forge
libbrotlienc 1.1.0 hd590300_1 conda-forge
libcap 2.69 h0f662aa_0 conda-forge
libcblas 3.9.0 16_linux64_mkl conda-forge
libclang-cpp15 15.0.7 default_h127d8a8_5 conda-forge
libclang13 18.1.5 default_h5d6823c_0 conda-forge
libcublas 11.11.3.6 0 nvidia
libcufft 10.9.0.58 0 nvidia
libcufile 1.9.1.3 0 nvidia
libcups 2.3.3 h4637d8d_4 conda-forge
libcurand 10.3.5.147 0 nvidia
libcusolver 11.4.1.48 0 nvidia
libcusparse 11.7.5.86 0 nvidia
libdeflate 1.20 hd590300_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libevent 2.1.12 hf998b51_1 conda-forge
libexpat 2.6.2 h59595ed_0 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libflac 1.4.3 h59595ed_0 conda-forge
libgcc-ng 13.2.0 h77fa898_7 conda-forge
libgcrypt 1.10.3 hd590300_0 conda-forge
libgettextpo 0.22.5 h59595ed_2 conda-forge
libgettextpo-devel 0.22.5 h59595ed_2 conda-forge
libgfortran-ng 13.2.0 h69a702a_7 conda-forge
libgfortran5 13.2.0 hca663fb_7 conda-forge
libglib 2.80.2 hf974151_0 conda-forge
libgpg-error 1.49 h4f305b6_0 conda-forge
libhwloc 2.10.0 default_h5622ce7_1001 conda-forge
libiconv 1.17 hd590300_2 conda-forge
libjpeg-turbo 3.0.0 hd590300_1 conda-forge
liblapack 3.9.0 16_linux64_mkl conda-forge
liblapacke 3.9.0 16_linux64_mkl conda-forge
libllvm15 15.0.7 hb3ce162_4 conda-forge
libllvm18 18.1.5 hb77312f_0 conda-forge
libnpp 11.8.0.86 0 nvidia
libnsl 2.0.1 hd590300_0 conda-forge
libnvjpeg 11.9.0.86 0 nvidia
libogg 1.3.4 h7f98852_1 conda-forge
libopus 1.3.1 h7f98852_1 conda-forge
libpng 1.6.43 h2797004_0 conda-forge
libpq 16.3 ha72fbe1_0 conda-forge
libsndfile 1.2.2 hc60ed4a_1 conda-forge
libsodium 1.0.18 h36c2ea0_1 conda-forge
libsqlite 3.45.3 h2797004_0 conda-forge
libstdcxx-ng 13.2.0 hc0a3c3a_7 conda-forge
libsystemd0 255 h3516f8a_1 conda-forge
libtiff 4.6.0 h1dd3fc0_3 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libvorbis 1.3.7 h9c3ff4c_0 conda-forge
libwebp-base 1.4.0 hd590300_0 conda-forge
libxcb 1.15 h0b41bf4_0 conda-forge
libxcrypt 4.4.36 hd590300_1 conda-forge
libxkbcommon 1.7.0 h662e7e4_0 conda-forge
libxml2 2.12.7 hc051c1a_0 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
lightning 2.2.4 pypi_0 pypi
lightning-utilities 0.11.2 pyhd8ed1ab_0 conda-forge
llvm-openmp 15.0.7 h0cdce71_0 conda-forge
llvmlite 0.42.0 pypi_0 pypi
lxml 5.2.2 pypi_0 pypi
lz4-c 1.9.4 hcb278e6_0 conda-forge
markdown 3.6 pypi_0 pypi
markdown-it-py 3.0.0 pypi_0 pypi
markupsafe 2.1.5 py310h2372a71_0 conda-forge
matplotlib 3.8.4 py310hff52083_2 conda-forge
matplotlib-base 3.8.4 py310hef631a5_2 conda-forge
matplotlib-inline 0.1.7 pyhd8ed1ab_0 conda-forge
mctc-lib 0.3.1 h74f4db8_0 conda-forge
mdurl 0.1.2 pypi_0 pypi
mistune 3.0.2 pyhd8ed1ab_0 conda-forge
mkl 2022.1.0 h84fe81f_915 conda-forge
mkl-devel 2022.1.0 ha770c72_916 conda-forge
mkl-include 2022.1.0 h84fe81f_915 conda-forge
mpc 1.3.1 hfe3b2da_0 conda-forge
mpfr 4.2.1 h9458935_1 conda-forge
mpg123 1.32.6 h59595ed_0 conda-forge
mpmath 1.3.0 pyhd8ed1ab_0 conda-forge
multidict 6.0.5 pypi_0 pypi
munkres 1.1.4 pyh9f0ad1d_0 conda-forge
mysql-common 8.3.0 hf1915f5_4 conda-forge
mysql-libs 8.3.0 hca2cd23_4 conda-forge
natsort 8.4.0 pypi_0 pypi
nbclient 0.10.0 pyhd8ed1ab_0 conda-forge
nbconvert 7.16.4 hd8ed1ab_0 conda-forge
nbconvert-core 7.16.4 pyhd8ed1ab_0 conda-forge
nbconvert-pandoc 7.16.4 hd8ed1ab_0 conda-forge
nbformat 5.10.4 pyhd8ed1ab_0 conda-forge
ncurses 6.5 h59595ed_0 conda-forge
nest-asyncio 1.6.0 pyhd8ed1ab_0 conda-forge
networkx 3.3 pyhd8ed1ab_1 conda-forge
notebook 7.2.0 pyhd8ed1ab_0 conda-forge
notebook-shim 0.2.4 pyhd8ed1ab_0 conda-forge
nspr 4.35 h27087fc_0 conda-forge
nss 3.100 hca3bf56_0 conda-forge
numba 0.59.1 pypi_0 pypi
numpy 1.26.4 py310hb13e2d6_0 conda-forge
nvidia-ml-py 12.550.52 pypi_0 pypi
openjpeg 2.5.2 h488ebb8_0 conda-forge
openssl 3.3.0 h4ab18f5_2 conda-forge
overrides 7.7.0 pyhd8ed1ab_0 conda-forge
packaging 24.0 pyhd8ed1ab_0 conda-forge
pandas 2.2.2 py310hf9f9076_1 conda-forge
pandoc 3.2 ha770c72_0 conda-forge
pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge
parso 0.8.4 pyhd8ed1ab_0 conda-forge
patsy 0.5.6 pyhd8ed1ab_0 conda-forge
pcre2 10.43 hcad00b1_0 conda-forge
pexpect 4.9.0 pyhd8ed1ab_0 conda-forge
pickleshare 0.7.5 py_1003 conda-forge
pillow 10.3.0 py310hf73ecf8_0 conda-forge
pip 24.0 pyhd8ed1ab_0 conda-forge
pixman 0.43.2 h59595ed_0 conda-forge
pkgutil-resolve-name 1.3.10 pyhd8ed1ab_1 conda-forge
platformdirs 4.2.2 pyhd8ed1ab_0 conda-forge
ply 3.11 pyhd8ed1ab_2 conda-forge
prometheus_client 0.20.0 pyhd8ed1ab_0 conda-forge
prompt-toolkit 3.0.42 pyha770c72_0 conda-forge
prompt_toolkit 3.0.42 hd8ed1ab_0 conda-forge
protobuf 5.26.1 pypi_0 pypi
psutil 5.9.8 py310h2372a71_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge
pulseaudio-client 17.0 hb77b528_0 conda-forge
pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge
pycparser 2.22 pyhd8ed1ab_0 conda-forge
pygithub 2.3.0 pypi_0 pypi
pygments 2.18.0 pyhd8ed1ab_0 conda-forge
pyjwt 2.8.0 pypi_0 pypi
pynacl 1.5.0 pypi_0 pypi
pyparsing 3.1.2 pyhd8ed1ab_0 conda-forge
pyqt 5.15.9 py310h04931ad_5 conda-forge
pyqt5-sip 12.12.2 py310hc6cd4ac_5 conda-forge
pysocks 1.7.1 pyha2e5f31_6 conda-forge
pyteomics 4.7.2 pypi_0 pypi
python 3.10.14 hd12c33a_0_cpython conda-forge
python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge
python-fastjsonschema 2.19.1 pyhd8ed1ab_0 conda-forge
python-json-logger 2.0.7 pyhd8ed1ab_0 conda-forge
python-tzdata 2024.1 pyhd8ed1ab_0 conda-forge
python_abi 3.10 4_cp310 conda-forge
pytorch 2.3.0 py3.10_cuda11.8_cudnn8.7.0_0 pytorch
pytorch-cuda 11.8 h7e8668a_5 pytorch
pytorch-lightning 2.2.0 pyhd8ed1ab_0 conda-forge
pytorch-mutex 1.0 cuda pytorch
pytz 2024.1 pyhd8ed1ab_0 conda-forge
pyyaml 6.0.1 py310h2372a71_1 conda-forge
pyzmq 26.0.3 py310h6883aea_0 conda-forge
qt-main 5.15.8 hc9dc06e_21 conda-forge
qtconsole-base 5.5.2 pyha770c72_0 conda-forge
qtpy 2.4.1 pyhd8ed1ab_0 conda-forge
readline 8.2 h8228510_1 conda-forge
referencing 0.35.1 pyhd8ed1ab_0 conda-forge
requests 2.32.1 pyhd8ed1ab_0 conda-forge
rfc3339-validator 0.1.4 pyhd8ed1ab_0 conda-forge
rfc3986-validator 0.1.1 pyh9f0ad1d_0 conda-forge
rich 13.7.1 pypi_0 pypi
rich-click 1.8.2 pypi_0 pypi
rmsd 1.5.1 pypi_0 pypi
rpds-py 0.18.1 py310he421c4c_0 conda-forge
scikit-learn 1.4.2 pypi_0 pypi
scipy 1.13.0 py310h93e2701_1 conda-forge
seaborn 0.13.2 hd8ed1ab_2 conda-forge
seaborn-base 0.13.2 pyhd8ed1ab_2 conda-forge
send2trash 1.8.3 pyh0d859eb_0 conda-forge
setuptools 69.5.1 pyhd8ed1ab_0 conda-forge
simple-dftd3 1.0.0 hd59d2e7_0 conda-forge
sip 6.7.12 py310hc6cd4ac_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
sniffio 1.3.1 pyhd8ed1ab_0 conda-forge
soupsieve 2.5 pyhd8ed1ab_1 conda-forge
spectrum-utils 0.4.2 pypi_0 pypi
stack_data 0.6.2 pyhd8ed1ab_0 conda-forge
statsmodels 0.14.2 py310h261611a_0 conda-forge
sympy 1.12 pypyh9d50eac_103 conda-forge
tbb 2021.12.0 h297d8ca_1 conda-forge
tblite 0.3.0 hf49bc11_1 conda-forge
tensorboard 2.16.2 pypi_0 pypi
tensorboard-data-server 0.7.2 pypi_0 pypi
terminado 0.18.1 pyh0d859eb_0 conda-forge
threadpoolctl 3.5.0 pypi_0 pypi
tinycss2 1.3.0 pyhd8ed1ab_0 conda-forge
tk 8.6.13 noxft_h4845f30_101 conda-forge
toml 0.10.2 pyhd8ed1ab_0 conda-forge
toml-f 0.4.2 hd8f1df9_0 conda-forge
tomli 2.0.1 pyhd8ed1ab_0 conda-forge
torch-ema 0.3 pypi_0 pypi
torchmetrics 1.4.0.post0 pyhd8ed1ab_0 conda-forge
torchtriton 2.3.0 py310 pytorch
tornado 6.4 py310h2372a71_0 conda-forge
tqdm 4.66.4 pyhd8ed1ab_0 conda-forge
traitlets 5.14.3 pyhd8ed1ab_0 conda-forge
types-python-dateutil 2.9.0.20240316 pyhd8ed1ab_0 conda-forge
typing-extensions 4.11.0 hd8ed1ab_0 conda-forge
typing_extensions 4.11.0 pyha770c72_0 conda-forge
typing_utils 0.1.0 pyhd8ed1ab_0 conda-forge
tzdata 2024a h0c530f3_0 conda-forge
unicodedata2 15.1.0 py310h2372a71_0 conda-forge
uri-template 1.3.0 pyhd8ed1ab_0 conda-forge
urllib3 2.2.1 pyhd8ed1ab_0 conda-forge
wcwidth 0.2.13 pyhd8ed1ab_0 conda-forge
webcolors 1.13 pyhd8ed1ab_0 conda-forge
webencodings 0.5.1 pyhd8ed1ab_2 conda-forge
websocket-client 1.8.0 pyhd8ed1ab_0 conda-forge
werkzeug 3.0.3 pypi_0 pypi
wheel 0.43.0 pyhd8ed1ab_1 conda-forge
widgetsnbextension 4.0.10 pyhd8ed1ab_0 conda-forge
wrapt 1.16.0 pypi_0 pypi
xcb-util 0.4.0 hd590300_1 conda-forge
xcb-util-image 0.4.0 h8ee46fc_1 conda-forge
xcb-util-keysyms 0.4.0 h8ee46fc_1 conda-forge
xcb-util-renderutil 0.3.9 hd590300_1 conda-forge
xcb-util-wm 0.4.1 h8ee46fc_1 conda-forge
xkeyboard-config 2.41 hd590300_0 conda-forge
xorg-kbproto 1.0.7 h7f98852_1002 conda-forge
xorg-libice 1.1.1 hd590300_0 conda-forge
xorg-libsm 1.2.4 h7391055_0 conda-forge
xorg-libx11 1.8.9 h8ee46fc_0 conda-forge
xorg-libxau 1.0.11 hd590300_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xorg-libxext 1.3.4 h0b41bf4_2 conda-forge
xorg-libxrender 0.9.11 hd590300_0 conda-forge
xorg-renderproto 0.11.1 h7f98852_1002 conda-forge
xorg-xextproto 7.3.0 h0b41bf4_1003 conda-forge
xorg-xf86vidmodeproto 2.3.1 h7f98852_1002 conda-forge
xorg-xproto 7.0.31 h7f98852_1007 conda-forge
xtb 6.6.1 hf49bc11_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
yaml 0.2.5 h7f98852_2 conda-forge
yarl 1.9.4 pypi_0 pypi
zeromq 4.3.5 h75354e8_4 conda-forge
zipp 3.17.0 pyhd8ed1ab_0 conda-forge
zlib 1.2.13 hd590300_5 conda-forge
zstd 1.5.6 ha6fb4c9_0 conda-forge
when running the command
casanovo train path/to/trainpeaks.hdf5 -p path/to/valpeaks.hdf5 --config casanovo/casanovo/config.yaml
I get the following log file. Log file:
2024-05-22 20:33:58,405 INFO [casanovo/MainProcess] casanovo.setup_model : Casanovo version 0.1.dev330+g42882f8.d20240522
2024-05-22 20:33:58,405 DEBUG [casanovo/MainProcess] casanovo.setup_model : model = None
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : config = casanovo/casanovo/config.yaml
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : output = somepath/casanovo_20240522203358
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : precursor_mass_tol = 50.0
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : isotope_error_range = (0, 1)
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : min_peptide_len = 6
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : predict_batch_size = 1024
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : n_beams = 1
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : top_match = 1
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : accelerator = auto
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : devices = None
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : random_seed = 454
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : n_log = 1
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : tb_summarywriter = None
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : save_top_k = 5
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : model_save_folder_path =
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : val_check_interval = 50000
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : n_peaks = 150
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : min_mz = 50.0
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : max_mz = 2500.0
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : min_intensity = 0.01
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : remove_precursor_tol = 2.0
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : max_charge = 10
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : dim_model = 64
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : n_head = 8
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : dim_feedforward = 64
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : n_layers = 5
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : dropout = 0.0
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : dim_intensity = None
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : max_length = 100
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : warmup_iters = 100000
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : cosine_schedule_period_iters = 600000
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : learning_rate = 0.0005
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : weight_decay = 1e-05
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : train_label_smoothing = 0.01
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : train_batch_size = 128
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : max_epochs = 5
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : num_sanity_val_steps = 0
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : calculate_precision = False
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : residues = {'G': 57.021464, 'A': 71.037114, 'S': 87.032028, 'P': 97.052764, 'V': 99.068414, 'T': 101.04767, 'C+57.021': 160.030649, 'L': 113.084064, 'I': 113.084064, 'N': 114.042927, 'D': 115.026943, 'Q': 128.058578, 'K': 128.094963, 'E': 129.042593, 'M': 131.040485, 'H': 137.058912, 'F': 147.068414, 'R': 156.101111, 'Y': 163.063329, 'W': 186.079313, 'M+15.995': 147.0354, 'N+0.984': 115.026943, 'Q+0.984': 129.042594, '+42.011': 42.010565, '+43.006': 43.005814, '-17.027': -17.026549, '+43.006-17.027': 25.980265, 'C': 160.030649, 'M[15.99]': 147.0354, 'N[0.98]': 115.026943, 'Q[0.98]': 129.042594}
2024-05-22 20:33:58,406 DEBUG [casanovo/MainProcess] casanovo.setup_model : n_workers = 10
2024-05-22 20:33:58,408 INFO [casanovo/MainProcess] casanovo.train : Training a model from:
2024-05-22 20:33:58,408 INFO [casanovo/MainProcess] casanovo.train : somepath/trainpeaks.hdf5
2024-05-22 20:33:58,408 INFO [casanovo/MainProcess] casanovo.train : Using the following validation files:
2024-05-22 20:33:58,408 INFO [casanovo/MainProcess] casanovo.train : somepath/valpeaks.hdf5
2024-05-22 20:34:00,489 WARNING [py.warnings/MainProcess] warnings._showwarnmsg : somepath/lib/python3.10/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:653: Checkpoint directory exists and is not empty.
2024-05-22 20:34:00,489 WARNING [py.warnings/MainProcess] warnings._showwarnmsg : somepath/lib/python3.10/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:653: Checkpoint directory exists and is not empty.
I ran on linux centos rhel fedora. But also on my Macbook M2 I get the error. Could it be the dataset? Even if I just initialise the dataloader and iterate over it, I get the error.
File "somepath/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
File "somepath/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "somepath/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "somepath/lib/python3.10/site-packages/casanovo/data/datasets.py", line 263, in __getitem__
) = self.index[idx]
File "somepath/lib/python3.10/site-packages/depthcharge/data/hdf5.py", line 335, in __getitem__
return self.get_spectrum(idx)
File "somepath//lib/python3.10/site-packages/depthcharge/data/hdf5.py", line 480, in get_spectrum
spec_info = super().get_spectrum(idx)
File "somepath/lib/python3.10/site-packages/depthcharge/data/hdf5.py", line 272, in get_spectrum
start_offset = offsets[0]
IndexError: index 0 is out of bounds for axis 0 with size 0
How did you create those trainpeaks.hdf5
and valpeaks.hdf5
files? Because Casanovo doesn't export those for the users by default.
Additionally, the listed Casanovo version is 0.1.dev330+g42882f8.d20240522
, which is probably because you installed from the source rather than the official PyPI package. Please use the latest Casanovo version 4.2.0 instead to avoid issues due to an uncontrolled setup.
I've created the trainpeaks.hdf5
and valpeaks.hdf5
files using the AnnotatedSpectrumIndex
from depthcharge.data
from version 0.2.3 - I think this should be fine since casanovo also creates the index that way? As ms_data_files
I pass the path, where all the .mgf files are stored.
EDIT: when executing the casanovo train
with the path pointing to the .mgf files for training and validation one training epoch is possible. That seems really weird, since casanovo basically calles the same AnnotatedSpectrumIndex internally here and here . The difference only lies that no .hdf5 is saved and loaded. Could it be a bug when saving the .hdf5 peak files?
Unfortunately this non-standard use is very hard for us to debug. As Casanovo works correctly, we will not investigate this further. If you think that there is a specific bug with creating the index, you can open an issue on the DepthCharge repository. However, considering that we haven't encountered this issue with other usages of DepthCharge, in Casanovo and several other projects, I recommend that you first double-check your own code.
I understand. It's fine - I will try debug and see whether the error is when moving to the newest depthcharge version. I just thought it would make sense to save the .hdf5 once and re-use it again instead of processing it every time one wants to run a training.
Thank you!
I'm attempting to train Casanovo using the latest
dev
branch with preprocessed hdf5 files generated bydepthcharge
, however when using these as inputs I appear to run into an indexing error during either training/validation when the dataloaders attempt to access the hdf5 to retrieve a new batch. This is similar to an error I've observed while manually creating depthchargeSpectrumIndex
objects during my own experiments without removing a previously written hdf5 file (the same data appears to be written twice). I've attached the two hdf5 files along with the config file I'm using to this post. test_data.val.hdf5.txt test_data.train.hdf5.txt test.yaml.txt