-
```
TORCHDYNAMO_REPRO_AFTER=dynamo ./benchmarks/dynamo/torchbench.py --training --performance --no-skip --inductor --only resnet50_quantized_qat
...
torch._dynamo.exc.BackendCompilerFailed: backend…
-
I get an error that did not previously occur on my intel mac using TF 2.2.0 with the following loss function:
```
def get_grad_and_loss(self, x, y):
with tf.GradientTape(persistent=True) …
-
@digantamisra98 Hi,
So `hard_mish = min(2, max(0, x+2))*x/2`
https://github.com/digantamisra98/H-Mish
http://fooplot.com/#W3sidHlwZSI6MCwiZXEiOiJtaW4oMixtYXgoMCx4KzIpKSp4LzIiLCJjb2xvciI6IiMwMDAwM…
-
```
nn.Hardtanh(inplace=True),
BinarizeConv2d(int(192*self.ratioInfl), int(384*self.ratioInfl), kernel_size=3, padding=1),
```
this is a sample code from alexnet binary.py, what i don't understa…
-
### Describe the bug
i'm using the train_dreambooth_flux.py to finetune flux. i get oom on 4x A100 80gb with deepspeed stage 2, gradient checkpoint, bf16 mixed precision, 1024px *1024px input, adafac…
-
### Issue Description
Hi,
I'm trying to implement a Deep Explainer for a Resnet50 imported from Torchvision and executed on cifar100. The basic implementation is not working because of an in-pla…
-
### 🐛 Describe the bug
When doing cpu offload for activations in FSDP, I expect that memory to be cleaned up after each backwards pass since the activations are no longer used. However, I'm seeing …
b-chu updated
1 month ago
-
# Batch_input and elapsed time per iteration slow down during model training
![微信图片编辑_20240629150957](https://github.com/EleutherAI/gpt-neox/assets/140717408/dae875c7-c01f-47e0-8767-aa8fe53cd476)
…
-
Activation Function(활성화 함수)
활성화 함수는 신경학적으로 뉴런 발사의 과정에 해당한다.
뉴런이 다음 뉴런으로 신호를 보낼 때 입력 신호가 일정 기준 이상이라면 보내고 기준에 달하지 못하면 보내지 않는다. 즉, 신호를 결정해준다.
![image](https://user-images.githubusercontent.com/445…
-
### 请提出你的问题 Please ask your question
代码:
x = paddle.rand(shape=[2, 5], dtype=np.float32)
x.stop_gradient = False
u = paddle.exp(x)
du_dx = paddle.grad(u, x, create_graph=True)[0]
d2u_dx2 = paddl…