falloutdurham / beginners-pytorch-deep-learning

Repository for scripts and notebooks from the book: Programming PyTorch for Deep Learning
MIT License
456 stars 265 forks source link

chapter 5 becoming obsolete :-( #70

Open g-i-o-r-g-i-o opened 2 years ago

g-i-o-r-g-i-o commented 2 years ago

New features brought to torchtext are making the code obsolete (fields, dictionaries etc). You can still run the code for a little bit, using modified imports (from torchtext.legacy import data). Are you going to update the code for the 5th chapter, or is the job too long? Will we have an updated version of the book? thanks in advance.

MarcusFra commented 2 years ago

Hi! Yes, I am planning to upload a second notebook fully compatible with the new api features, but there still need to be some changes and tests made. Currently I've got a lot to do, but I'll try to upload it within the next months.

You can find an experimental version here: https://github.com/MarcusFra/beginners-pytorch-deep-learning/blob/chapter5_new_api/chapter5/Chapter_5_new_api__temp.ipynb

Please note, that it's the same dataset, but a different model used (without LSTM). There are also some issues left in the data preparation stage, like padding etc. I am going to change those parts to make it similar to the example in the book.

g-i-o-r-g-i-o commented 2 years ago

Thank you very much for your answer and for you help. Learning torchtext isn't easy because it's growing bigger every day...

MarcusFra commented 2 years ago

You are welcome, I am going to leave a message here once the final version is pushed. If you focus on processing text data, the Hugging Face library and spaCy library (among others like nlkt) might also be of interest to you. You can find there lots of state-of-the-art implementations and datasets, which let you build up NLP applications releatively quickly.

ccshao commented 1 year ago

The book is very helpful, looking forward to the updated codes with pytorch 2.0 and lastest torchtext (0.15.0). Many thanks!

MarcusFra commented 1 month ago

Hi @ccshao,

I had a look again on this (I already took a look at this in April where the solutions appeared not to be stable yet). The solution now with pytorch > 2.x still does not seem to be stable (using datapipes from torchdata, which will be removed soon, see this issue ; also DataLoader (and DataLoader v2) still seem to be subject to changes/ are being re-evaluated currently). On top of this the development of torchtext is stopped since the release of release 0.18 in October 2023).

I will keep an eye on the changes in the use of datapipes and dataloaders and hope to publish a final stable solution for pytorch > 2.x by the end of this year regarding torchtext.

For now please refer to the legacy solution or the experimental solution by me mentioned earlier in this issue. Another possibility would be to use the Huggingface library instead of torchtext.