Closed bittremieux closed 4 months ago
Unfortunately when I tried to replicate this in a TPU runtime environment I get this error:
2024-06-27 00:27:15.702704: I tensorflow/core/tpu/tpu_api_dlsym_initializer.cc:95] Opening library: /usr/local/lib/python3.10/dist-packages/tensorflow/python/platform/../../libtensorflow_cc.so.2
2024-06-27 00:27:15.702909: I tensorflow/core/tpu/tpu_api_dlsym_initializer.cc:119] Libtpu path is: libtpu.so
2024-06-27 00:27:15.759569: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Seed set to 454
INFO: Casanovo version 4.2.1.dev1+gc6a455b.d20240627
INFO: Sequencing peptides from:
INFO: sample_data/sample_preprocessed_spectra.mgf
[libprotobuf FATAL external/com_google_protobuf/src/google/protobuf/message.cc:258] File is already registered: xla/service/cpu/backend_config.proto
terminate called after throwing an instance of 'google::protobuf::FatalException'
what(): File is already registered: xla/service/cpu/backend_config.proto
https://symbolize.stripped_domain/r/?trace=7f16bb6f59fc,7f16bb6a151f&map=
*** SIGABRT received by PID 1782 (TID 1782) on cpu 24 from PID 1782; stack trace: ***
PC: @ 0x7f16bb6f59fc (unknown) pthread_kill
@ 0x7f15c82214f9 928 (unknown)
@ 0x7f16bb6a1520 (unknown) (unknown)
https://symbolize.stripped_domain/r/?trace=7f16bb6f59fc,7f15c82214f8,7f16bb6a151f&map=5edeb7d86db111100e979a74159a3982:7f15b8600000-7f15c8440ba0
E0627 00:27:21.381225 1782 coredump_hook.cc:447] RAW: Remote crash data gathering hook invoked.
E0627 00:27:21.381246 1782 client.cc:272] RAW: Coroner client retries enabled (b/136286901), will retry for up to 30 sec.
E0627 00:27:21.381253 1782 coredump_hook.cc:542] RAW: Sending fingerprint to remote end.
E0627 00:27:21.381276 1782 coredump_hook.cc:551] RAW: Cannot send fingerprint to Coroner: [NOT_FOUND] stat failed on crash reporting socket /var/google/services/logmanagerd/remote_coredump.socket (Is the listener running?): No such file or directory
E0627 00:27:21.381284 1782 coredump_hook.cc:603] RAW: Dumping core locally.
E0627 00:27:23.920542 1782 process_state.cc:808] RAW: Raising signal 6 with default behavior
This also occurred after explicitly installing the PyTorch package release that supports TPUs. Here is the notebook I used to try to replicate the log entries: https://colab.research.google.com/drive/1zFZ248QPRT5ddXEOC2LBwUronJOWbMAE?usp=sharing
I couldn't get a TPU instance, but the warnings are also there when running an a CPU or GPU CoLab instance, so it's probably related to CoLab rather than TPU. You could briefly look into it, but I don't think it's worth wasting a lot of time on this.
Another issue: through some experimenting it looks like the tensorflow warnings are logged before the Casanovo module is even loaded, so it looks like trying to filter the warnings would be a lot more difficult than its worth.
As reported via email, when running Casanovo on a TPU-enabled Colab instance, it gives some warnings:
These seem harmless, so we could try to avoid them from being printed.