Transformer day - Githubissues

bpopeters commented 1 year ago

I have just a couple of comments about the tokenization section.

The answer to "why is tokenization important" doesn't include the fact that we need subword tokenization in order to represent any string with a finite vocabulary. It's basically impossible to build a reasonable language generation system without this. I could see the "official" answer being difficult to explain to students because it is not very specific.
Do we think it is necessary for the students to do stopword removal and stemming/lemmatization? These seem distracting, given that they wouldn't be performed for an LM task.

pedrobalage commented 1 year ago

Exercise 4.5 - Suggest to check for the backend 'mps' or 'cuda' and alternatively run in GPU.

torch.backends.mps.is_available()

pedrobalage commented 1 year ago

Let's change exercise 4.5 to model.prompt("The best part of traveling to Lisbon is", ...) :)

pedrobalage commented 1 year ago

There are some differences in the guide and notebooks. Please, check if the code is the same.

LxMLS / lxmls-toolkit