-
## 🐛 Bug
Running a model parallel with 2 gpus on FAIR cluster raises the following exception with the 1.3B_gptz model:
UPDATE: When we use model_parallel=2 and 8 gpus, this works, but it should not …
-
-
**The bug**
In `deepspeed.module_inject.replace_module.py`, `replace_module()` is being called on meta tensors before the actual weights are loaded just a few lines below, resulting in `NotImplemente…
-
Hi @suchenzang and @stephenroller,
Thanks for your effort on this incredible project!
### 1. Motivation:
Just would like to share that, for **open-ended text generation**, OPT models can prod…
-
## 🚀 Feature Request
Loading models is a bit of a pain right now. It's done differently in multiple scripts (including our internal eval scripts). Not all ways are compatible with all checkpoint fo…
-
Currently eval_lm.py requires data to be in the legacy format (.bin, .idx files and a dict.txt). This is annoying because all of my data is in the jsonl format and pre-processing them into the legacy …
-
E.g. have the feature ids be `pybedtools.Interval` objects or something. Or some other kind of multiindex?
I'm thinking of this for methylation data and binding data.
@gpratt @mlovci any input?
-
Hi,
I tried reproducing the OPT results for various datasets using the [LM-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness) framework.
I observe that the OPT **Accuracy** score…
-
How is missing signal data in bigwig file treated? Are these intervals treated as zeros or ignored? Is there an argument for how this is treated in the `array()` method?
-
During the training of a 125M model I observe a relatively smooth valid ppl curve, with some minor jumps. For example, between steps 100K and 156K of the training, valid/redditflattened ppl shown on T…