-
### Anything you want to discuss about vllm.
within vllm/attention/ops/triton_flash_attention.py, we don't need dropout, philox_, etc. stuff.
should consider to clean them up for code simplicity.
#…
-
Hi,
Great code and thanks for sharing :). How do you run the model with dropout? It doesn't seem to actually be implemented in the training process.
-Peter
-
Hi,
In the "dropout from scratch" chapter, there is no significative difference between adding or not adding dropout. See metrics below:
with drop_prob=0.5:
Epoch 0. Loss: 0.7281168998689048, T…
-
I tried training from scratch as explained in readme.
```
Training / Fine-tuning
pip install deepspeed==0.7.0 // pip install pytorch-lightning==1.9.5 // torch 1.13.1+cu117
NOTE: add weight de…
-
I'm not sure issues is the greatest place to post this but I just wanted to see if anyone else had been trying this idea:
There was [a paper that came out recently](https://arxiv.org/abs/2410.05258…
-
I've revised the setting.py as below, but when running marker or marker_single, still it works under CPU mode.
----reivsion ---
line10:
class Settings(BaseSettings):
# General
TORCH_DEV…
-
You define a dropout_rate hyperparameter but don't seem to use it anywhere. What's that about?
-
The paper mentions
> "Dropout [9] was applied after each IndRNN layer with a dropping probability of 0.25 and 0.1 for CS and CV settings, respectively."
and
> "Dropout [9] with a droppin…
-
It's accepted that sometimes out solutions will lose data connection for a few seconds.
In most Grafana dashboards, we add either
` |> aggregateWindow(every: limited_window, fn: mean, createEmpty:…
-
[Paper](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf)
[PyTorch implementation](https://github.com/pytorch/pytorch/blob/master/torch/nn/_functions/dropout.py)
Assigned to @americast a…