Migrate from GloVe 6B to pre-trained cc.en.300.vec.gz from FastText
Tweak default instance type to ml.g4dn.xlarge
Add some mitigations to help prevent file corruptions/overwrite when running local & SageMaker notebooks in parallel
Update default SageMaker training container TF v1.14->1.15, in line with the currently configured SMStudio kernel
FastText offers multi-lingual pre-trained embeddings (vs English-only) and marginally faster download time (although in current implementation this is offset by increases in downstream processing times)
PyTorch alternatives not yet updated, pending further testing.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Issue #, if available: #1
Description of changes:
In TF.Keras NLP notebooks:
ml.g4dn.xlarge
FastText offers multi-lingual pre-trained embeddings (vs English-only) and marginally faster download time (although in current implementation this is offset by increases in downstream processing times)
PyTorch alternatives not yet updated, pending further testing.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.