-
Given https://github.com/fairinternal/metaseq-internal/pull/181 , it seems like arch is not necessarily present in args in the training workflow when loading from disk. This seems like yet another ca…
-
## 🚀 Feature Request
Add documentation for converting fine-tuned OPT models to huggingface
### Motivation
HuggingFace [added OPT to their suite which is a great win for the community!](https://…
-
There are currently two logbooks checked into the OPT project directory: https://github.com/facebookresearch/metaseq/tree/main/projects/OPT/chronicles which were placed there for easy access on releas…
-
Hi @kokitsuyuzaki,
Thanks for making this nice R package.
Recently, I have been trying to understand the different steps in the method used in metaSeq package as I want to use this tool for an ana…
-
This is to look into whether or not we still need to use apex for speedups if out-of-the-box PyTorch 2.0 may "just work". Will require benchmarking at a few different scales to confirm.
-
This is to look into whether or not we can remove our Megatron dependency and rely entirely on our Fairscale dependency (model parallelism implementation seems to be identical between the two).
-
Hi,
Regarding the models listed here https://github.com/pytorch/fairseq/tree/main/examples/moe_lm the model card and NOTE file included with the models say "Models are intended for research use onl…
-
Right now, if we were to launch a job with --restore-file **and** experience a job restart shortly thereafter (before a new checkpoint is written), we fail to resume again from the --restore-file and …
-
Most likely because we loop over bunch of tensors (gradients / activations norms etc) and move them to cpu for logging.
Weirdly this happens outside of WPS and UPS counters, so we were not noticing…
-
## 🚀 Feature Request
Convert opt checkpoint to megatron-lm or fastertransformer
### Motivation
I am currently trying to use opt in a production environment.
However, because the 175B model is …