-
* Goal: Run model [Qwen2-7B](https://huggingface.co/Qwen/Qwen2-7B) on the TT Wormhole device.
* Changes: Add this directory `models/demos/wormhole/qwen2_7b`.
## Approach
We will leverage the ex…
-
Hello, thank you for your great work! M2bert paper mentioned that "Monarch Mixer is part of a new class of architectures called state-space models (SSMs), which include S4, Mamba, and BiGS".
Is Monar…
-
Our current architecture in sample-factory is just an MLP encoder; I suspect a permutation invariant or GNN-based architecture would be better
-
I think it would be very useful to add more discriminators, from the tests I have done with conditional GANs, it seems that having several discriminators with different levels of reception fields incr…
-
If you open a GitHub issue, here is our policy:
It must be a bug, a feature request, or a significant problem with the documentation (for small docs fixes please send a PR instead).
The form below…
-
hello,
can i have the config files for others datasets and cnn3, mlp architecture. i would like to generate the pretrained weights for all datasets/
I meant only the models on all tasks training. i…
-
## Description
The KAN (Kolmogorov Activation Network) model from the pykan library currently only supports two-dimensional input tensors (batch_size x hid_dim). A `RuntimeError` is raised when att…
-
-
Here I post some efficiency testing numbers for Monarch based MLP v.s. vanilla nn.Linear based MLP. I found that Monarch is best suitable for MLPs in Transformer architectures, which generally have la…
-
I was reviewing the example of dynamic edge conv. I'm not sure if the aim of this example is to reproduce the results of the paper. However, I found that I think it is a discrepancy between the implem…