Closed konstin closed 3 years ago
hi @konstin , sorry for late reply and thanks for flagging. Yes actually I noticed this is an issue also within fairseq with learned positional embeddings, depending on pytorch version you'll get those gnarly error messages or something that's easier to pinpoint. Let us look at putting some check there
We put a check in place now; actually the full answer is quite subtle:
Let us know if that makes sense!
Bug description
Having a sequence longer than 1022 residues causes the an unspecific exception on CPU and GPU. On the CPU, it says
IndexError: index out of range in self
. On the Quadro RTX 8000 I tested, it causes aCUDA error: device-side assert triggered
that will cause all further attempts to embed sequences of any length to fail with the same error.I'm aware that esm was trained with sequences of less than 1024 amino acids; I'm opening this issue because this does not seem to be mentioned in the repo nor the paper, and from the error message in the exception it's hard to figure out what is wrong. I'd also be interested in how you'd suggest handling user input with longer sequences: Should this simply error with a message that this is not supported or do you suggest another way of handling this (I've seen https://github.com/facebookresearch/esm/issues/21#issuecomment-763217386 listing some strategies)?
Reproduction steps
Pretty much the readme example, only with a longe sequence added:
Expected behavior
Either a way to handle long sequences, or an error message that explains the length limit, ideally with a note in the readme. That error also really shouldn't poison the GPU in ways that I need to restart the process before I can do any proper computation again, but not sure if you can do anything about it or if that's an issue with torch and/or cuda
Logs
CPU:
GPU:
Trying sequences shorter than 1024 afterwards:
Additional context
Ubuntu 18.04, python 3.8, torch 1.7.1, cuda 10.2, Driver Version 455.23.05,
pip install -U git+https://github.com/facebookresearch/esm
with 537ad6afa22bd493fc479ce3509ebee83b62e594