-
### System Info
Transformers version 4.41.2
Platform: Ubuntu 22.04.4 LTS
Python: 3.10.14
### Who can help?
@younesbelkada @ArthurZucker
### Information
- [ ] The official example s…
-
Running a Raspberry Pi 4B with DietPi Debian Bookworm
This is complicated and will sound confusing. See the summary for clarification.
On a newly installed operating system, I installed speechre…
-
### System Info
GPT2
torch==2.3.1
DDP
using transformers Trainer 4.41.2
### Who can help?
@muellerzr @SunMarc (Trainer code)
@ArthurZucker and @younesbelkada (text models)
### …
-
Dropout is really the bane of equinox it seems. Loose follow-up of #681 - effectively, I'm trying to fix this problem that cropped up a while ago when using `optax.MultiSteps` for gradient accumulatio…
-
This HDMI Switcher I am using is having some severe video dropouts when I attempt to use it with this:
https://www.amazon.com/gp/product/B07MJ783KG/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1
…
-
Hi,
Great code and thanks for sharing :). How do you run the model with dropout? It doesn't seem to actually be implemented in the training process.
-Peter
-
Hi,
In the "dropout from scratch" chapter, there is no significative difference between adding or not adding dropout. See metrics below:
with drop_prob=0.5:
Epoch 0. Loss: 0.7281168998689048, T…
-
The pytorch base implementation of [`scaled_dot_product_attention`](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html#torch.nn.functional.scaled_dot_produ…
-
The shared `EncoderLayer` used by a few models, although I've only looked at `TorchTHP`, has a `use_residual` flag that defaults to `False`, and I don't think it is set to True on installation of `Tor…
-
As the title says, I've not found any regularization method as a dropout in your current implementation and experiments. Is there a specific reason for not doing that? Perhaps the learned activation …