facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.16k stars 627 forks source link

RuntimeError: File cannot be opened #474

Closed HyprValent closed 1 year ago

HyprValent commented 1 year ago

I created a simple fasta file that is formatted as follows:

{sequence id} {full sequence}

I want to compute embeddings for these sequences so I followed along with the readme with the following code:

python scripts/extract.py esm2_t33_650M_UR50D data/output.fasta data/output_esm --repr_layers 0 32 33 --include mean per_tok

But it led to a runtime error. The full traceback is below:

Read data\output.fasta with 5167 sequences
Processing 1 of 151 batches (77 sequences)
Traceback (most recent call last):
  File "C:\...\scripts\extract.py", line 137, in <module>
    main(args)
  File "C:\...\scripts\extract.py", line 128, in main
    torch.save(
  File "C:\...\Miniconda3\lib\site-packages\torch\serialization.py", line 422, in save
    with _open_zipfile_writer(f) as opened_zipfile:
  File "C:\...\Miniconda3\lib\site-packages\torch\serialization.py", line 309, in _open_zipfile_writer
    return container(name_or_buffer)
  File "C:\...\Miniconda3\lib\site-packages\torch\serialization.py", line 287, in __init__
    super(_open_zipfile_writer_file, self).__init__(torch._C.PyTorchFileWriter(str(name)))
RuntimeError: File data\output_esm\M07480:75:000000000-KPMR9:1:1118:5103:4246.pt cannot be opened.

What happened? I'm not sure what went wrong.

Thank you!

HyprValent commented 1 year ago

Never mind! I'm dumb and found out that my sequence ids, which were used as a filenames, contained invalid characters.