gradient-activation Search Results

1000+ results
for gradient-activation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/torchtune #1779

How to use finetuned lora adapter in a huggingface-like pipe…

Hi, thanks for this amazing project. I was trying to finetune the lora model for Llama3.2 Vision which works fine and saved a adapter_0.pt; Then I wanted to use this adapter checkpoint for inference i…

ryf1123 updated 1 week ago
1
tensorflow/tensorflow #51859

'Failed to apply delegate' error occurred when training with…

### 1. System information #### Converter - OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04.2 LTS - TensorFlow installation (pip package or built from source): pip package (p…

hyeonsu94 updated 6 days ago
2
boostcampaitech3/level2-semantic-segmentation-level2-cv-18 #10

train loss = NaN되는 현상

mosaic으로 offline augmentation한 데이터로 knet+upernet 학습시킬 때 loss가 NaN으로 나오면서 학습이 터지는 현상이 있었습니다. 찾아보니 CNN+activation을 거치면 gradient가 급변하게 되는데 이때 loss 값이 크게 튀어서 나오는 현상이라고 합니다. 해결법은 모든 activation function…

jsh0551 updated 2 years ago
1
maxpumperla/deep_learning_and_the_game_of_go #90

cross-entropy loss with negative reward/advantage resulting …

Hi again. I finally found some time to continue with your book. This time I ran into a problem in chapters 10 and 12, where you have the policy and the actor-critic agents (same problem for both). Aft…

nutpen85 updated 3 years ago
2
OptimalScale/LMFlow #726

[BUG] LISA: same loss regardless of lisa_activated_layers

**Describe the bug** I think there might be something wrong with the current LISA implementation. There is no difference in training loss, no matter how many layers are active. Not using LMFlow bu…

geronimi73 updated 6 months ago
17
itayhubara/BinaryNet.pytorch #9

Does the Binarize() function use STE？

Does the Binarize() function use STE？ I haven't seen the STE algorithm in this whole project.

leejiajun updated 4 years ago
29
shap/shap #3593

BUG: Error when using DeepExplainer on LSTM Model

### Issue Description I'm training a deep neural network predicting if two records refer to the same entity using LSTM layers. I can't get SHAP to work on the LSTM model, but it does provide values o…

jgolliher updated 6 months ago
1
chr5tphr/zennit #192

Proper way of handling classifier heads in Transformers

Hi, I am attempting to implement Conservative LRP rules as part of #184 since I need it for a university project. I am running into certain issues with the classifier heads and was hoping you could…

kjczarne updated 1 year ago
2
google/neural-tangents #191

Erf function goes beyond [-1,1]

The NN with erf function output activation can occassionally output way beyond the boundary [-1,1]: ``` from jax import random from neural_tangents import stax import neural_tangents as nt impo…

bangxiangyong updated 10 months ago
2
sterrettJD/gpLM-reading-group #3

some curriculum suggestions

Hey John! Here's the curriculum that I've worked on in the past. It's a bit less focused on language models as a sole topic, and more on modern ML from a broad perspective. - Essential Concepts of …

zmaas updated 1 month ago
3

上一页 1...25 26 27 28 29 30 31...100 下一页

1000+ results for gradient-activation

1000+ results
for gradient-activation