DeepChainBio / bio-transformers

bio-transformers is a wrapper on top of the ESM/Protbert model, trained on millions on proteins and used to predict embeddings.
https://bio-transformers.readthedocs.io/en/latest/getting_started/install.html
Apache License 2.0
143 stars 31 forks source link

a worker died or was killed while executing a task by an unexpected system error #26

Open wushixian opened 3 years ago

wushixian commented 3 years ago

I use 4 GPUs to calculate MSA embeddings, but each time the process terminated, the error was raise by ray, the error message is " a worker died or was killed while executing a task by an unexpected system error", the GPU process terminated one by one, I tried several times, I update ray with lastest version, the problem is same. How can I treat the problem? Thanks!

wushixian commented 3 years ago

I tried again and just use CPU to calculate embeddings. and I found it still teminated. I check esm document and it is said that some problem with model esm_msa1_t12_100M_UR50S and recommend using esm_msa1b_t12_100M_UR50S, but I can't find where to modify the code to use esm_msa1b_t12_100M_UR50S, could somebody tell me? thanks.

delfosseaurelien commented 3 years ago

Hello,

I will add esm_msa1b_t12_100M_UR50S model in few minutes.

delfosseaurelien commented 3 years ago

I tried again and just use CPU to calculate embeddings. and I found it still teminated. I check esm document and it is said that some problem with model esm_msa1_t12_100M_UR50S and recommend using esm_msa1b_t12_100M_UR50S, but I can't find where to modify the code to use esm_msa1b_t12_100M_UR50S, could somebody tell me? thanks.

delfosseaurelien commented 3 years ago

model esm_msa1b_t12_100M_UR50S added.

delfosseaurelien commented 3 years ago

I use 4 GPUs to calculate MSA embeddings, but each time the process terminated, the error was raise by ray, the error message is " a worker died or was killed while executing a task by an unexpected system error", the GPU process terminated one by one, I tried several times, I update ray with lastest version, the problem is same. How can I treat the problem? Thanks!

I will check this, it seems there is an issue with Ray.

wushixian commented 3 years ago

Thank you very much!