-
Give credits when you revamp code you dud
-
Memory Sharing normalization avoids saving the input activation for backward, relying on the next layer to save the normalization output. This makes sense if the next layer is linear (as in LLaMa), bu…
-
Hi,
In the sample code example provided for SimCSE, should we set the **dropout** or has it been set emplicitly:
```
model_name = 'distilroberta-base'
word_embedding_model = models.Transformer…
-
Should be pretty straight forward, warmup for #10
* [x] expand the cudnn bindings in `rcudnn`
* [x] use the `rcudnn` bindings in `coaster-nn`
* [x] create a apropriate interface in `coaster`
* [x]…
-
Looking at the source of the dropout layer, it looks like it just passes through its input (both directions), and its seed and rate are unused.
-
Hi, @Haiyang-W! You have done a very interesting work. However, I encounter a problem when calculating the FLOPs of the GiT model. When I run `python tools/analysis_tools/get_flops.py`, it output 0 F…
-
Hello.
Here is my pre-trained model summary. I get errors when using DeepExplainer. I am using 503 test backgrounds for computing Shap values.
Anyone can give a hand? Thanks.
Error : Layer "sequen…
-
Thinking about today lesson and the explanation of DropOut technique:
There are some pros as take under control overfitting using shared node, and some cons as the slowdown in convergence.
I thought…
-
Hello everyone! I really like this library, and the Coding Train vídeos really opened my mind to all the possibilities of ANN.
I was reading about ways to avoid overfitting and Dropout comes up a …
-
totally separate report - [http://www.state.nj.us/education/data/drp/](http://www.state.nj.us/education/data/drp/)