1shanpanta / Nepali-LLM

0 stars 1 forks source link

Tokenizer Issue #7

Closed 1shanpanta closed 2 weeks ago

1shanpanta commented 2 weeks ago

Tokenizer is associated tokens with negative values. Look at data/nepali_tokenizer.vocab for further information.

Example :

"""

ा▁ -0 ो▁ -1 को▁ -2 ्र -3 न् -4 """

Tokenizers should give results like :

ा▁ 0 ो▁ 1 को▁ 2 ्र 3 न् 4

Can't figure out what's wrong with the code. Help @SamirWagle.

For further information about Tokenizers , please watch Karpathy's videos.

SamirWagle commented 2 weeks ago

Check it out, In the transformer file. It seems correct.