-
The goal is to create a stable patched Ambermoon version in german and english. We focus on the english patch for now.
-
These lines of code set norm and gate to be trained in float32
https://github.com/deepseek-ai/DeepSeek-MoE/blob/66edeee5a4f75cbd76e0316229ad101805a90e01/finetune/finetune.py#L238-L247
But with deep…
drxmy updated
7 months ago
-
### The Feature
Store the abbreviated string of sk-key in db like openai does (can do this in the token and spend logs table)
### Motivation, pitch
easier for app owner viewing their spend l…
-
Using version b2589
Attempting convert-hf-to-gguf.py on
https://huggingface.co/stabilityai/stablelm-2-12b-chat
Results in error:
Can not map tensor 'model.layers.0.self_attn.k_layernorm.norms.0.we…
-
**Describe the bug**
When trying to install watsonx.governance I'm expecting an openpages instance to be created, and it appears to create one but then deleted it during the install.
**To Reproduc…
-
I set up the plugin strictly following the instructions (using supabase CLI and downloading the last release from github).
The Table Editor shows the two tables `document` (7 records) and `document_s…
-
Hi,
I am trying to run a simulation using the multiple_models template on two datasets from the synergy dataset. However, when the template runs a simulation using the sbert model there is an error…
-
### Checklist
- [X] I've read the [contribution guidelines](https://github.com/autowarefoundation/autoware/blob/main/CONTRIBUTING.md).
- [X] I've searched other issues and no duplicate issues were…
-
Over the past several months, the .NET team has evaluated ways to evolve the .NET tooling ecosystem and incorporate more capabilities into VS Code. Currently, the C# experience in VS Code is powered b…
-
参考这篇谷歌的论文:https://arxiv.org/pdf/2305.15663.pdf
看起来只是改了一层conformer的fc层,加了个MOE模块
@Mddct 周神有啥看法
详细的训练策略有待研究(是否需要冻结参数?),论文看的我有点懵,如果有大佬指导下就更好了(respect)
加个知乎文章: https://zhuanlan.zhihu.com/p/671873…