-
I am working on a time-series classification task.Does mamba work for my task?
-
### Feature request
The LLaMA 3 implementation should generate default `position_ids` that take the `attention_mask` into account.
@ArthurZucker @younesbelkada
### Motivation
Is there a s…
-
## 一言でいうと
Transformerを強化学習に応用した研究。State/Action/Rewardの系列を入力して次の行動を予測させる。収録済みの軌跡から学習するオフライン強化学習で、既存の手法を上回る精度(オンラインの強化学習ではまだ検証されていない)。
![image](https://user-images.githubusercontent.com/544269/12…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
- `llamafactory` version: 0.8.3.dev0
- Platform: Linux-5.4.54-1.0.0.std7c.el7.2.x86_64-x86_64-with-glibc…
-
### Feature request
i believe `labels` in the training of causal LMs means the value to predict at time `n`, i.e., the next token. in other words, i'd assume, if `labels` is given, it should be alrea…
-
Hi. I have a phoneme-based Zipformer model.
Before this [PR](https://github.com/k2-fsa/sherpa-onnx/pull/828), I was able to apply hotwords encoding for phoneme sequences, e.g. `ɪ z/dʒ ʌ s t/b ɛ s t…
w11wo updated
2 months ago
-
```
(textgen) [root@pve-m7330 sparsegpt]# python llama.py ../text-generation-webui/models/TinyLlama-1.1B-Chat-v1.0/ wikitext2 --nsamples 10
Token indices sequence length is longer than the specified…
-
Generative modeling -- triplet works
https://github.com/mmcdermott/EventStreamGPT/blob/main/EventStream/transformer/generation/generation_utils.py#L73
Simplify this code^
With triplet code - you …
-
Apparently the eigenspectrum of the sinc matrix (with delay width \tau_w), a regularized version of which is being used in the linear filter optimizes the “centralization problem". The eigenvectors of…
-
When users log into the modeling app, they click on their OAuth provider (e.g. Google) and then sign in, and then they see this.
Users have to ignore the big code, and instead press the little …