bentrevett / pytorch-seq2seq

Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
MIT License
5.37k stars 1.34k forks source link

Change preprocessing pipeline for torchtext 0.13. Field has been deprecated. #188

Closed marvinli00 closed 9 months ago

marvinli00 commented 2 years ago

Thank you for providing such a detailed tutorial. I updated the preprocessing pipeline for torchtext 0.13 by replacing Field and BucketIterator with get_tokenizer and Dataloader according to the official torchtext migration guideline. Code has been tested locally. The training results differ slightly from the original one because padding token is now part of vocabulary which increases the number of parameters in the embedding layer. I updated for the first tutorial for now. If you think it will be helpful, I will update the rest.

Zeta611 commented 1 year ago

Thank you for doing this! I think this deserves more attention.

scliubit commented 1 year ago

updating the processing pipeline seems to cause a performance drain on tut6, and i have no idea what have caused the result =( . Any insights?

It turns out to be the problem of BucketIterator, which is deprecated in the new torchtext releases.

vaishn99 commented 1 year ago

@marvinli00 Thank you for sharing.