-
It does not work with the recent version of Keras/Tensorflow, apparently one has to add tensorflow before all keras.xxx, such as tensorflow.keras.optimizers!
-
I still can not understand which option( w_elem_format_bp, a_elem_format_bp, a_elem_format_bp_ex, a_elem_format_bp_os ) represents gradient?
In fact , in the BP process, I wish to set the gradient as…
-
I have been working through the paper trying to understand things and examining the code for computing edge weights and I believe I have discovered some unexpected behavior, as well as some other conf…
-
### System Info
- `transformers` version: 4.44.0
- Platform: Linux-5.4.0-162-generic-x86_64-with-glibc2.31
- Python version: 3.11.9
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.…
-
I followed your 'adding a new model' guide to add Mixtral. It appears transformers mixtral does not have a MixtralMLP as suggested by the guide. The other items can be imported OK. As a workaround …
-
Dear Developers,
first of all thanks for sharing with the community your amazing work.
I have recently started to use this library in one of my projects and I have noted some numerical instabilit…
-
Currently, the model config is logged twice during startup:
1. via `AutoConfig.from_pretrained`
2. via `AutoTokenizer.from_pretrained` -> `AutoConfig.from_pretrained`
Should there be a state va…
-
Do we also have to scale the labels to [-1, 1] and calculate the loss while using tanh activation function in the training phase?
If my task is to generate images (labels in [0, 800]), how can I ge…
-
**Describe the bug**
I'm trying to run training of SwinUNETR model on a multi-GPU node (4xV00 - 16GB VRAM) with effective batch size per GPU of 1 and sample size 96x96x96. However, even after many tw…
-
I am trying to make an architecture work with opacus . It consists of two encoders that use Self-attention and produces context embeddings x_t and y_t. “Knowledge Retriever” is using masked attention.…