Closed nickbhat closed 2 years ago
Thanks for calling that out, will do the new pip build as soon as some recent updates get merged.
This should be resolved now with current main branch released as a new version on pip.
! pip install fair-esm
import esm
model, alphabet = esm.pretrained.esm1_t6_43M_UR50S()
batch_converter = alphabet.get_batch_converter()
data = [
("protein1", "K A <mask> I S Q"),
("protein2", "KA<mask>ISQ"),
]
batch_labels, batch_strs, batch_tokens = batch_converter(data)
print(batch_tokens)
print(f"Mask idx: {alphabet.mask_idx}")
print(f"Unk idx: {alphabet.unk_idx}")
gives the desired result.
Plesae give it a try!
This seems related to #161
The code in the README example does not work as intended with the PyPI build. The
<mask>
string is converted to a series of unks, rather than a mask token.Reproduction steps Install using pip, following the README,
pip install fair-esm
. Run the following exampleThe output I get is
Expected behavior I assume the intended output both line 1 and line 2 is
tensor([0, 15, 5, 32, 12, 8, 16, 2])
However, the<mask>
string is converted to a series of unks, as are the whitespaces.