Changelog
### 0.6.2
```
General updates:
- Better serialization for all models and tokenizers (BERT, GPT, GPT-2 and Transformer-XL) with [best practices for saving/loading](https://github.com/huggingface/pytorch-pretrained-BERTserialization-best-practices) in readme and examples.
- Relaxing network connection requirements (fallback on the last downloaded model in the cache when we can't reach AWS to check eTag)
Breaking changes:
- `warmup_linear` method in `OpenAIAdam` and `BertAdam` is now replaced by flexible [schedule classes](https://github.com/huggingface/pytorch-pretrained-BERTlearning-rate-schedules) for linear, cosine and multi-cycles schedules.
Bug fixes and improvements to the library modules:
- add a flag in BertTokenizer to skip basic tokenization (john-hewitt)
- Allow tokenization of sequences > 512 (CatalinVoss)
- clean up and extend learning rate schedules in BertAdam and OpenAIAdam (lukovnikov)
- Update GPT/GPT-2 Loss computation (CatalinVoss, thomwolf)
- Make the TensorFlow conversion tool more robust (marpaia)
- fixed BertForMultipleChoice model init and forward pass (dhpollack)
- Fix gradient overflow in GPT-2 FP16 training (SudoSharma)
- catch exception if pathlib not installed (potatochip)
- Use Dropout Layer in OpenAIGPTMultipleChoiceHead (pglock)
New scripts and improvements to the examples scripts:
- Add BERT language model fine-tuning scripts (Rocketknight1)
- Added SST-2 task and remaining GLUE tasks to 'run_classifier.py' (ananyahjha93, jplehmann)
- GPT-2 generation fixes (CatalinVoss, spolu, dhanajitb, 8enmann, SudoSharma, cynthia)
```
### 0.6.1
```
Add `regex` to the requirements for OpenAI GPT-2 tokenizer.
```
### 0.6.0
```
Add OpenAI small GPT-2 pretrained model
```
### 0.5.1
```
Mostly a bug fix update for loading the `TransfoXLModel` from s3:
* Fixes a bug in the loading of the pretrained `TransfoXLModel` from the s3 dump (which is a converted `TransfoXLLMHeadModel`) in which the weights were not loaded.
* Added a fallback of `OpenAIGPTTokenizer` on BERT's `BasicTokenizer` when SpaCy and ftfy are not installed. Using BERT's `BasicTokenizer` instead of SpaCy should be fine in most cases as long as you have a relatively clean input (SpaCy+ftfy were included to exactly reproduce the paper's pre-processing steps on the Toronto Book Corpus) and this also let us use the `never_split` option to avoid splitting special tokens like `[CLS], [SEP]...` which is easier than adding the tokens after tokenization.
* Updated the README on the tokenizers options and methods which was lagging behind a bit.
```
### 0.5.0
```
New pretrained models:
- **Open AI GPT** pretrained on the *Toronto Book Corpus* ("Improving Language Understanding by Generative Pre-Training" by Alec Radford et al.).
- This is a slightly modified version of our previous PyTorch implementation to increase the performances by spliting words and position embeddings in separate embeddings matrices.
- Performance checked to be on part with the TF implementation on ROCStories: single run evaluation accuracy of 86.4% vs. authors reporting a median accuracy of 85.8% with the TensorFlow code (see details in the example section of the readme).
- **Transformer-XL** pretrained on *WikiText 103* ("Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" by Zihang Dai, Zhilin Yang et al.). This is a slightly modified version of Google/CMU's PyTorch implementation to match the performances of the TensorFlow version by:
- untying relative positioning embeddings across layers,
- changing memory cells initialization to keep sinusoïdal positions identical
- adding full logits outputs in the adaptive softmax to use it in a generative setting.
- Performance checked to be on part with the TF implementation on WikiText 103: evaluation perplexity of 18.213 vs. authors reporting a perplexity of 18.3 on this dataset with the TensorFlow code (see details in the example section of the readme).
New scripts:
- Updated the SQuAD fine-tuning script to work also on SQuAD V2.0 by abeljim and Liangtaiwan
- `run_lm_finetuning.py` let you pretrain a `BERT` language model or fine-tune it with masked-language-modeling and next-sentence-prediction losses by deepset-ai, tholor and nhatchan (compatibility Python 3.5)
Backward compatibility:
- The library is now compatible with Python 2 also
Improvements and bug fixes:
- add a `never_split` option and arguments to the tokenizers (WrRan)
- better handle errors when BERT is feed with inputs that are too long (patrick-s-h-lewis)
- better layer normalization layer initialization and bug fix in examples scripts: args.do_lower_case is always True(donglixp)
- fix learning rate schedule issue in example scripts (matej-svejda)
- readme fixes (danyaljj, nhatchan, davidefiocco, girishponkiya )
- importing unofficial TF models in BERT (nhatchan)
- only keep the active part of the loss for token classification (Iwontbecreative)
- fix argparse type error in example scripts (ksurya)
- docstring fixes (rodgzilla, wlhgtc )
- improving `run_classifier.py` loading of saved models (SinghJasdeep)
- In examples scripts: allow do_eval to be used without do_train and to use the pretrained model in the output folder (jaderabbit, likejazz and JoeDumoulin )
- in `run_squad.py`: fix error when `bert_model` param is path or url (likejazz)
- add license to source distribution and use entry-points instead of scripts (sodre)
```
### 0.4.0
```
New:
- 3-4 times speed-ups in fp16 (versus fp32) thanks to NVIDIA's work on apex (by FDecaYed)
- new sequence-level multiple-choice classification model + example fine-tuning on SWAG (by rodgzilla)
- improved backward compatibility to python 3.5 (by hzhwcmhf)
- bump up to PyTorch 1.0
- load fine-tuned model with `from_pretrained`
- add examples on how to save and load fine-tuned models.
```
### 0.3.0
```
This release comprise the following improvements and updates:
- added two new pre-trained models from Google: `bert-large-cased` and `bert-base-multilingual-cased`,
- added a model that can be fine-tuned for token-level classification: `BertForTokenClassification`,
- added tests for every model class, with and without labels,
- fixed tokenizer loading function `BertTokenizer.from_pretrained()` when loading from a directory containing a pretrained model,
- fixed typos in model docstrings and completed the docstrings,
- improved examples (added `do_lower_case`argument).
```
### 0.2.0
```
Improvement:
- Added a `cache_dir` option to `from_pretrained()` function to select a specific path to download and cache the pre-trained model weights. Useful for distributed training (see readme) (fix issue 44).
Bug fixes in model training and tokenizer loading:
- Fixed error in CrossEntropyLoss reshaping (issue 55).
- Fixed unicode error in vocabulary loading (issue 52).
Bug fixes in examples:
- Fix weight decay in examples (previously bias and layer norm weights were also decayed due to an erroneous check in training loop).
- Fix fp16 grad norm is None error in examples (issue 43).
Updated readme and docstrings
```
### 0.1.2
```
This is the first release of `pytorch_pretrained_bert`.
```
Links
- PyPI: https://pypi.org/project/pytorch-pretrained-bert
- Changelog: https://pyup.io/changelogs/pytorch-pretrained-bert/
- Repo: https://github.com/huggingface/pytorch-pretrained-BERT
This PR pins pytorch_pretrained_bert to the latest release 0.6.2.
Changelog
### 0.6.2 ``` General updates: - Better serialization for all models and tokenizers (BERT, GPT, GPT-2 and Transformer-XL) with [best practices for saving/loading](https://github.com/huggingface/pytorch-pretrained-BERTserialization-best-practices) in readme and examples. - Relaxing network connection requirements (fallback on the last downloaded model in the cache when we can't reach AWS to check eTag) Breaking changes: - `warmup_linear` method in `OpenAIAdam` and `BertAdam` is now replaced by flexible [schedule classes](https://github.com/huggingface/pytorch-pretrained-BERTlearning-rate-schedules) for linear, cosine and multi-cycles schedules. Bug fixes and improvements to the library modules: - add a flag in BertTokenizer to skip basic tokenization (john-hewitt) - Allow tokenization of sequences > 512 (CatalinVoss) - clean up and extend learning rate schedules in BertAdam and OpenAIAdam (lukovnikov) - Update GPT/GPT-2 Loss computation (CatalinVoss, thomwolf) - Make the TensorFlow conversion tool more robust (marpaia) - fixed BertForMultipleChoice model init and forward pass (dhpollack) - Fix gradient overflow in GPT-2 FP16 training (SudoSharma) - catch exception if pathlib not installed (potatochip) - Use Dropout Layer in OpenAIGPTMultipleChoiceHead (pglock) New scripts and improvements to the examples scripts: - Add BERT language model fine-tuning scripts (Rocketknight1) - Added SST-2 task and remaining GLUE tasks to 'run_classifier.py' (ananyahjha93, jplehmann) - GPT-2 generation fixes (CatalinVoss, spolu, dhanajitb, 8enmann, SudoSharma, cynthia) ``` ### 0.6.1 ``` Add `regex` to the requirements for OpenAI GPT-2 tokenizer. ``` ### 0.6.0 ``` Add OpenAI small GPT-2 pretrained model ``` ### 0.5.1 ``` Mostly a bug fix update for loading the `TransfoXLModel` from s3: * Fixes a bug in the loading of the pretrained `TransfoXLModel` from the s3 dump (which is a converted `TransfoXLLMHeadModel`) in which the weights were not loaded. * Added a fallback of `OpenAIGPTTokenizer` on BERT's `BasicTokenizer` when SpaCy and ftfy are not installed. Using BERT's `BasicTokenizer` instead of SpaCy should be fine in most cases as long as you have a relatively clean input (SpaCy+ftfy were included to exactly reproduce the paper's pre-processing steps on the Toronto Book Corpus) and this also let us use the `never_split` option to avoid splitting special tokens like `[CLS], [SEP]...` which is easier than adding the tokens after tokenization. * Updated the README on the tokenizers options and methods which was lagging behind a bit. ``` ### 0.5.0 ``` New pretrained models: - **Open AI GPT** pretrained on the *Toronto Book Corpus* ("Improving Language Understanding by Generative Pre-Training" by Alec Radford et al.). - This is a slightly modified version of our previous PyTorch implementation to increase the performances by spliting words and position embeddings in separate embeddings matrices. - Performance checked to be on part with the TF implementation on ROCStories: single run evaluation accuracy of 86.4% vs. authors reporting a median accuracy of 85.8% with the TensorFlow code (see details in the example section of the readme). - **Transformer-XL** pretrained on *WikiText 103* ("Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" by Zihang Dai, Zhilin Yang et al.). This is a slightly modified version of Google/CMU's PyTorch implementation to match the performances of the TensorFlow version by: - untying relative positioning embeddings across layers, - changing memory cells initialization to keep sinusoïdal positions identical - adding full logits outputs in the adaptive softmax to use it in a generative setting. - Performance checked to be on part with the TF implementation on WikiText 103: evaluation perplexity of 18.213 vs. authors reporting a perplexity of 18.3 on this dataset with the TensorFlow code (see details in the example section of the readme). New scripts: - Updated the SQuAD fine-tuning script to work also on SQuAD V2.0 by abeljim and Liangtaiwan - `run_lm_finetuning.py` let you pretrain a `BERT` language model or fine-tune it with masked-language-modeling and next-sentence-prediction losses by deepset-ai, tholor and nhatchan (compatibility Python 3.5) Backward compatibility: - The library is now compatible with Python 2 also Improvements and bug fixes: - add a `never_split` option and arguments to the tokenizers (WrRan) - better handle errors when BERT is feed with inputs that are too long (patrick-s-h-lewis) - better layer normalization layer initialization and bug fix in examples scripts: args.do_lower_case is always True(donglixp) - fix learning rate schedule issue in example scripts (matej-svejda) - readme fixes (danyaljj, nhatchan, davidefiocco, girishponkiya ) - importing unofficial TF models in BERT (nhatchan) - only keep the active part of the loss for token classification (Iwontbecreative) - fix argparse type error in example scripts (ksurya) - docstring fixes (rodgzilla, wlhgtc ) - improving `run_classifier.py` loading of saved models (SinghJasdeep) - In examples scripts: allow do_eval to be used without do_train and to use the pretrained model in the output folder (jaderabbit, likejazz and JoeDumoulin ) - in `run_squad.py`: fix error when `bert_model` param is path or url (likejazz) - add license to source distribution and use entry-points instead of scripts (sodre) ``` ### 0.4.0 ``` New: - 3-4 times speed-ups in fp16 (versus fp32) thanks to NVIDIA's work on apex (by FDecaYed) - new sequence-level multiple-choice classification model + example fine-tuning on SWAG (by rodgzilla) - improved backward compatibility to python 3.5 (by hzhwcmhf) - bump up to PyTorch 1.0 - load fine-tuned model with `from_pretrained` - add examples on how to save and load fine-tuned models. ``` ### 0.3.0 ``` This release comprise the following improvements and updates: - added two new pre-trained models from Google: `bert-large-cased` and `bert-base-multilingual-cased`, - added a model that can be fine-tuned for token-level classification: `BertForTokenClassification`, - added tests for every model class, with and without labels, - fixed tokenizer loading function `BertTokenizer.from_pretrained()` when loading from a directory containing a pretrained model, - fixed typos in model docstrings and completed the docstrings, - improved examples (added `do_lower_case`argument). ``` ### 0.2.0 ``` Improvement: - Added a `cache_dir` option to `from_pretrained()` function to select a specific path to download and cache the pre-trained model weights. Useful for distributed training (see readme) (fix issue 44). Bug fixes in model training and tokenizer loading: - Fixed error in CrossEntropyLoss reshaping (issue 55). - Fixed unicode error in vocabulary loading (issue 52). Bug fixes in examples: - Fix weight decay in examples (previously bias and layer norm weights were also decayed due to an erroneous check in training loop). - Fix fp16 grad norm is None error in examples (issue 43). Updated readme and docstrings ``` ### 0.1.2 ``` This is the first release of `pytorch_pretrained_bert`. ```Links
- PyPI: https://pypi.org/project/pytorch-pretrained-bert - Changelog: https://pyup.io/changelogs/pytorch-pretrained-bert/ - Repo: https://github.com/huggingface/pytorch-pretrained-BERT