-
Thank you for your previous work! I'm trying to apply this idea to one project. But now I have some problems.
May I ask whether the value of attention rank loss is very high during your training? M…
-
## Problem
In a Mixture of Experts (MoE) LLM, the gating network outputs a categorical distribution of $n$ values (chosen from $n_{max}$), which is then used to create a convex combination of the $n$…
-
### Problem Description
> I'll first describe my problems in this section and provide more details in the next sections.
Recently I've been training a **char-level** language model for my own expe…
-
There are plenty of amazing solutions for using large language models (LLMs) to help with searching. For sake of compressing this request, I'll point out four kinds of them that I want in a modern sea…
-
hi , after adopting the solution to your #12 problem, the following problems still occur
# Save eval, global step 9000
loaded infer model parameters from ../test_out_put/translate.ckpt-9000, time…
-
The current file example uses TorchRun. It would be great if it use an approach more like Falcon, etc. using transformers and AutoTokenizers - when I try, I get a plethera of errors. :-(
Somethin…
-
Hi
Have you considered to add Capacitorjs support to your stack?
-
see also
- https://github.com/linkml/linkml/issues/2033
-
## Egitilmis dil modelleri
Guncel dil modellerinin train edilmesi, Huggingface’teki transformer tabanli language modellerin Turkce icin egitilmesi ve hem TDD hem Huggingface uzerinden paylasilmasi.…
-
I couldn't solve it by myself, do you have any idea?
This error happens when I send the question
HOST=0.0.0.0
PORT=3000
![image](https://user-images.githubusercontent.com/60151732/229323017-…