-
你们的svd-logits和inception模型的aux_logits有什么区别呢?像admix这个,使用了aux_logits比没有使用的时候,黑盒上平均会高5%
-
For some reason, `_ConvolutionVariational` uses a [boolean flag](https://github.com/tensorflow/probability/blob/master/tensorflow_probability/python/layers/conv_variational.py#L236) to avoid calling `…
-
I found that in the paper, the formula of MLP attention is usually desribed as below:
![image](https://user-images.githubusercontent.com/16586180/39976766-fd23c30e-5767-11e8-9a16-9d0238512c82.png)
…
-
This is mostly just a checklist of the more important models we might want to support with fancy math. In theory, these are all supported automatically with broom (though we might want to have a gener…
-
I have run into a problem using the GLM function with Binomial family. I used the code below to create an instance of the glm:
logit_instance = sm.GLM(default_array, predictors_matrix, family=sm.fa…
-
When all inputs to entmax are -inf, it fails with
```
RuntimeError Traceback (most recent call last)
in
1 from entmax import entmax15
2 logits = torch…
-
When I used your algorithm and parameters to train on both the WTH dataset and my own dataset, I found that the loss was very low in the first epoch, but increased sharply in the second epoch, and sub…
-
Mistral has a new finetuner repository where you can assign *weights* to specific messages, and those will be taken into account when the loss is calculated. I wanted to implement something similar fo…
-
After completing a batch inference, I discovered a bug in the attention weight computation. The attention mask was being added to the attention weights with an unsqueeze operation that was using the …
-
In `categorical_crossentropy`, I suspect this normalization line to be not useful and leads to 2 unexpected behaviors
https://github.com/keras-team/keras/blob/b80dd12da9c0bc3f569eca3455e77762cf2ee8e…