OpenBioML / protein-lm-scaling

Other
54 stars 15 forks source link

Tokenizer (issue #3) #6

Closed jamaliki closed 10 months ago

jamaliki commented 11 months ago

That's a good point about the commenting @Muedi, I'll do it the moment I have a chance!

jamaliki commented 11 months ago

Hey @Muedi I added some comments to the rust code. I think the python is pretty simple, but if you have ideas of where I should add more explanations, I'm happy to do so. Please let me know if you think anything would be unclear for others.

pascalnotin commented 11 months ago

Should we subclass the tokenizer class in tokenizer.py with an "AptTokenizer" that includes the points we made today (eg., collapsing X and ). What is the role of ? Do we need "." and "-"? maybe if we use alignments?

jamaliki commented 11 months ago

How's this?