-
https://github.com/karpathy/minGPT/blob/37baab71b9abea1b76ab957409a1cc2fbfba8a26/mingpt/model.py#L42
Why do we need an additional linear transformation after the MHA and before the MLP when the dim…
-
### System Info
```
root@fb9fa1e6d8d8:/mnt/nas2/users/sbchoi/transformers/examples/pytorch/object-detection# transformers-cli env
Copy-and-paste the text below in your GitHub issue and FILL OUT…
-
Apologies if this has been asked before but I couldn't find any example that demonstrates this in a simple manner.
I have a model built in Equinox. Now, I want to use the `AdamW` optimizer where:
…
-
### 🚀 The feature, motivation and pitch
share repro for @bdhirsh , @tugsbayasgalan on the gaps of torch.compile for FSDP2 fp8 all-gather
for FSDP2 fp8 all-gather, it's criticial to pre-compute ama…
-
### Describe the bug
If `set_output` is set to `"pandas"`, `TransformedTargetRegressor` warns unnecessarily.
### Steps/Code to Reproduce
```python
import numpy as np
import pandas as pd
from skl…
-
Hi, how to cast a float/bfloat16 tensor to fp8? I want to conduct W8A8 (fp8) quantization. But I didn't find an example of quantizing act to FP8 format.
-
Traceback (most recent call last):
File "train.py", line 135, in
test_abs(args, device_id, cp, step)
File "E:\project\PreSumm\src\train_abstractive.py", line 215, in test_abs
model = …
-
Hi developer,
Thanks to develop the great tool to annotate the single cell,
I wander that this scGPT must be need GPU on centos7.9, and i hadn't the GPU, what about CPU to use this scG…
-
Here is what I am getting (see below)
FP8 slower than FP16
for FP16, multiples of 16 make things slower than multiple of 8
Am I missing something ?
Batch_size_multiple 16 // Seqlen multi…
-
hi, TE is really a great job.
how to use in FusedRMSNorm in TE?
https://github.com/NVIDIA/apex/blob/master/apex/normalization/fused_layer_norm.py#L329