-
Since Ada GPUs like 4090 limit the FP8 arithmetic into `fp32` accumulation, it only achieve the same max `TFLOPs` compared to `fp16xfp16` with `fp16` accumulation.
Further more, according to my test,…
-
ma_policy/graph_construct.py specifies that file mas/ppo/base-architectures.jsonnet contains example architectures, to the best of my ability I can't find that file in the repository.
-
### 🐛 Describe the bug
I created a minified repro to examine the cause of the runtime error (as the compiler seems to have no error report).
The card used to generate the repro is cuda:7.
Th…
-
UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
device: torch.device = torch.device("cpu"),
Models: ['llavamed']
-
### Describe the bug
I am using manual streaming mode in colab, and it shows the error
```
CalledProcessError Traceback (most recent call last)
[/usr/local/lib/python3.10/…
-
I attempted to use [this model](https://huggingface.co/PygmalionAI/pygmalion-6b) through inf2.24xlarge. This model is based on the GPTJ architecture, but when I run this model based on Neuron, the res…
-
https://arxiv.org/abs/2106.09685
-
I utilized LLMCompressor to quantize our model using the FP8-dynamic recipe. The quantized model was successfully tested using the SparseAutoModelForCausalLM method.
![image](https://github.com/use…
-
### Question
Hi Haotian,
Your job is great, well done.
I have a some issues that after I use my pruned vicuna LLM as the base model, I was succeed in the phase 1--pretraining.
![8423f0bfebba…
-
Our baselines use a PPO algorithm that is adapted from PureJaxRL. But it doesn't appear to stick to all of the relevant implementation details from [Huang et al., 2022](https://iclr-blog-track.github.…