gradient-activation Search Results

1000+ results
for gradient-activation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

cpury/keras_gradient_noise #2

Compatibility with recent Keras/Tensorflow (tf 2.0)

It does not work with the recent version of Keras/Tensorflow, apparently one has to add tensorflow before all keras.xxx, such as tensorflow.keras.optimizers!

fqassemi updated 4 years ago
9
microsoft/microxcaling #30

How to set gradient and activation to different formats???

I still can not understand which option( w_elem_format_bp, a_elem_format_bp, a_elem_format_bp_ex, a_elem_format_bp_os ) represents gradient? In fact , in the BP process, I wish to set the gradient as…

rensushan updated 1 month ago
1
saprmarks/feature-circuits #10

Potential bugs and confusion with `attribution.jvp`

I have been working through the paper trying to understand things and examining the code for computing edge weights and I believe I have discovered some unexpected behavior, as well as some other conf…

JacksonKaunismaa updated 2 months ago
1
huggingface/trl #2022

Negative Entropy in TRL PPOv2Trainer TLDR Example

### System Info - `transformers` version: 4.44.0 - Platform: Linux-5.4.0-162-generic-x86_64-with-glibc2.31 - Python version: 3.11.9 - Huggingface_hub version: 0.23.4 - Safetensors version: 0.4.…

RylanSchaeffer updated 1 month ago
3
AnswerDotAI/fsdp_qlora #43

Question about adding / training Mixtral

I followed your 'adding a new model' guide to add Mixtral. It appears transformers mixtral does not have a MixtralMLP as suggested by the guide. The other items can be imported OK. As a workaround …

chrismrutherford updated 6 months ago
1
tiny-dnn/tiny-dnn #410

cross entropy gives nan

Dear Developers, first of all thanks for sharing with the community your amazing work. I have recently started to use this library in one of my projects and I have noted some numerical instabilit…

bellonemauro updated 5 years ago
7
huggingface/transformers #11104

Model config is logged twice on startup

Currently, the model config is logged twice during startup: 1. via `AutoConfig.from_pretrained` 2. via `AutoTokenizer.from_pretrained` -> `AutoConfig.from_pretrained` Should there be a state va…

stas00 updated 3 years ago
1
soumith/ganhacks #75

Activation functions for training and testing phases

Do we also have to scale the labels to [-1, 1] and calculate the loss while using tanh activation function in the training phase？ If my task is to generate images (labels in [0, 800]), how can I ge…

SITUSITU updated 1 year ago
1
microsoft/DeepSpeed #2930

[BUG] DeepSpeed Cuda OOM on SwinUNETR from MONAI

**Describe the bug** I'm trying to run training of SwinUNETR model on a multi-GPU node (4xV00 - 16GB VRAM) with effective batch size per GPU of 1 and sample size 96x96x96. However, even after many tw…

majercakdavid updated 1 year ago
8
pytorch/opacus #644

Making a custom transformer architecture work with opacus

I am trying to make an architecture work with opacus . It consists of two encoders that use Self-attention and produces context embeddings x_t and y_t. “Knowledge Retriever” is using masked attention.…

nhianK updated 5 months ago
3

上一页 1...31 32 33 34 35 36 37...100 下一页

1000+ results for gradient-activation

1000+ results
for gradient-activation