-
## Recipe for Deep Learning
![image](https://user-images.githubusercontent.com/29543183/49347010-ed53ec80-f6d4-11e8-9c55-c48481671813.png)
bad performance do not always blame overfitting. 下图所示并不…
-
I am use tfp *-Flipout layers to construct a Bayesian neural network (BNN) and combine it with keras.fit to train. I am using a very similar way to define a BNN structure as a CNN but the keras.fit() …
-
Through comparative experiments, we found that what really reduces GPU memory is "torch.set_default_dtype(torch.float16)" and deepspeed. We used LLaMA-7B to conduct experiments, using
{
"zero_o…
-
Thought it would be neat to have gyro controls in Quake 3 :0)
Draft implementation on branch below ready to play with, but needs additional work.
Opening this issue for discussion around:
- […
-
Some transforms, notably FSDP and TensorParallel ones, change shapes, but currently do not completely update them (it does for the linear that follows, but not for the activation etc.).
We might con…
-
Hello,
I have a question regarding the loss function in [dc_layer.c.](https://github.com/Mazin-Hnewa/MS-DAYOLO/blob/main/src/dc_layer.c) Why do you use l.delta[i]=(l.d_truth[i]-l.output[i])/size; i…
-
@NEGU93 Thanks for this great piece of work!
I am having an issue with angle-based loss functions. See here a small example:
https://colab.research.google.com/drive/10y2eBxHMq5HCbHqOsrfKzvrJ7AV_Re…
-
Hello, when I change the activation function inside the formula.cpp file (for example, when I write the formula for the tanh function), it consistently returns an accuracy value of 9.80%. Do you know …
-
With both `flash_attn_varlen_qkvpacked_func` and `CheckpointImpl.NO_REENTRANT` raise Runtime Error below:
```python
Traceback (most recent call last):
> File "/opt/tiger/antelope/train.py", line …
-
I'm using the pre-trained weights and set the self.ntrain=1 to train only the last layer to detect human. I also had modified the number of classes and filters of the last two layers in cfg file. I st…