TobiasHeOl / AbLang

AbLang: A language model for antibodies
BSD 3-Clause "New" or "Revised" License
16 stars 1 forks source link

Maximum sequence length #3

Open ElArkk opened 2 years ago

ElArkk commented 2 years ago

Hi,

When using the likelihood functionality of AbLang, I'm noticing an upper limit of sequence length that it will still process (somewhere < 150 residues). Breaking example:

heavy_ablang = ablang.pretrained("heavy", device="cpu")
heavy_ablang.freeze()
likelihoods = heavy_ablang("HASTA"*40, mode='likelihood')

On CPU I get the following error: IndexError: index out of range in self

On GPU I get cryptic CUDA errors which bork the GPU until kernel restart.

Is there an known upper limit for sequence length?

Thanks!

TobiasHeOl commented 2 years ago

Hi ElArkk,

The upper limit of sequence length in AbLang is based on the positional embedding layer, which takes a max length of 160. Because of the addition of extra tokens the effective max length is 157. This length should be long enough to span the variable region of any standard antibody.

Nonetheless, I will add an error message explaining this.

I hope this helps.

Best, Tobias

ElArkk commented 2 years ago

That's perfect, thank you! :)

JonasLi-19 commented 2 months ago

sorry to cut in at this issue, but I am wondering why you choose 768 as inp_dim length? Does this has some relation with ESM?