gradient-activation Search Results

1000+ results
for gradient-activation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/pytorch #108110

[inductor] Minifier fails on resnet50_quantized_qat

``` TORCHDYNAMO_REPRO_AFTER=dynamo ./benchmarks/dynamo/torchbench.py --training --performance --no-skip --inductor --only resnet50_quantized_qat ... torch._dynamo.exc.BackendCompilerFailed: backend…

jansel updated 7 months ago
1
apple/tensorflow_macos #167

Matrix size incompatible error when using nested gradient ta…

I get an error that did not previously occur on my intel mac using TF 2.2.0 with the following loss function: ``` def get_grad_and_loss(self, x, y): with tf.GradientTape(persistent=True) …

maximocrv updated 3 years ago
3
AlexeyAB/darknet #6209

Hard mish

@digantamisra98 Hi, So `hard_mish = min(2, max(0, x+2))*x/2` https://github.com/digantamisra98/H-Mish http://fooplot.com/#W3sidHlwZSI6MCwiZXEiOiJtaW4oMixtYXgoMCx4KzIpKSp4LzIiLCJjb2xvciI6IiMwMDAwM…

AlexeyAB updated 3 years ago
20
itayhubara/BinaryNet.pytorch #22

Something I don't understand about the structure of alexnet

``` nn.Hardtanh(inplace=True), BinarizeConv2d(int(192*self.ratioInfl), int(384*self.ratioInfl), kernel_size=3, padding=1), ``` this is a sample code from alexnet binary.py, what i don't understa…

ChenDRAG updated 4 years ago
1
huggingface/diffusers #9484

FLUX dreambooth train on multigpu with deepspeed

### Describe the bug i'm using the train_dreambooth_flux.py to finetune flux. i get oom on 4x A100 80gb with deepspeed stage 2, gradient checkpoint, bf16 mixed precision, 1024px *1024px input, adafac…

zhangvia updated 6 days ago
12
shap/shap #3466

BUG: Backward hook with shap.DeepExplainer on simple models…

### Issue Description Hi, I'm trying to implement a Deep Explainer for a Resnet50 imported from Torchvision and executed on cifar100. The basic implementation is not working because of an in-pla…

PietroManganelliConforti updated 3 weeks ago
6
pytorch/pytorch #134051

FSDP activation CPU offload memory not cleaned up

### 🐛 Describe the bug When doing cpu offload for activations in FSDP, I expect that memory to be cleaned up after each backwards pass since the activations are no longer used. However, I'm seeing …

b-chu updated 1 month ago
9
NVIDIA/Megatron-LM #897

Batch_input and elapsed time per iteration slow down during …

# Batch_input and elapsed time per iteration slow down during model training ![微信图片编辑_20240629150957](https://github.com/EleutherAI/gpt-neox/assets/140717408/dae875c7-c01f-47e0-8767-aa8fe53cd476) …

Yuhanleeee updated 1 week ago
2
Sejong-Kaggle-Challengers/MAIN #18

Activation Function(활성화 함수)

Activation Function(활성화 함수) 활성화 함수는 신경학적으로 뉴런 발사의 과정에 해당한다. 뉴런이 다음 뉴런으로 신호를 보낼 때 입력 신호가 일정 기준 이상이라면 보내고 기준에 달하지 못하면 보내지 않는다. 즉, 신호를 결정해준다. ![image](https://user-images.githubusercontent.com/445…

mingxoxo updated 3 years ago
3
PaddlePaddle/Paddle #64420

调用paddle.grad函数计算指数函数和三角函数的高阶微分，报错如何解决

### 请提出你的问题 Please ask your question 代码： x = paddle.rand(shape=[2, 5], dtype=np.float32) x.stop_gradient = False u = paddle.exp(x) du_dx = paddle.grad(u, x, create_graph=True)[0] d2u_dx2 = paddl…

ChevalierOhm updated 4 months ago
1

上一页 1...13 14 15 16 17 18 19...100 下一页

1000+ results for gradient-activation

1000+ results
for gradient-activation