-
I'm curious, is there good evidence in paper that TreeGen is better than regular transformers?
I've noticed that other papers and my own experiments, if I increase the data set size then the extra …
-
From [Algorithmic Simplicity](https://www.youtube.com/@algorithmicsimplicity):
- [x] [Why Does Diffusion Work Better than Auto-Regression? - YouTube](https://www.youtube.com/watch?v=zc5NTeJbk-k)
-…
-
Hi, I have read with a lot if interest you [report on DAS](https://github.com/wedeling/EasySurrogate/blob/master/tutorials/deep_active_subspaces/report_DAS.pdf). I can see that it ends with some impor…
-
# Vision Transformer Adapter for Dense Predictions
Info.
- ICLR 2023 spotlight
- https://github.com/czczup/ViT-Adapter
- https://arxiv.org/abs/2205.08534
### Summary
- plain ViT
- whi…
-
1. Why Model-Based?
- It's possible to be more data efficient although model-free might have better asymptotic performance
- Models allow easily injecting inductive biases
2. What about other ge…
-
* [Link](https://arxiv.org/pdf/2006.11287.pdf)
* Title: Discovering Symbolic Models from Deep Learning with Inductive Biases
* Keywords (optional):
* Authors (optional):
* Reason (optional)…
-
Hello, I think adding a keyword extractor with [KeyBERT](https://github.com/MaartenGr/KeyBERT) would be quite useful. The keywords extracted could be used for paraphrasing or summarizing with `logit_b…
-
Get an idea of the different flavours of scaling-law works that are out there. Any work that tries to estimate the optimal scale of model and dataset size, with regards to a certain metric (PPL, or ot…
-
### Model description
Plain-DETR is an object detector that maintains a "plain" nature: using a single-scale feature map and global cross-attention calculations without specific locality constraints.…
-
### Summary
While unpenalized GLMs like `PoissonRegressor(alpha=0, fit_intercept=False)` yield a minimum norm solution on wide data, `PoissonRegressor(alpha=0, fit_intercept=True)` does not.
Detec…