-
I'm trying to run gptlint on only one rule (use-correct-english). But for the life of me, I can't get it to work.
Here's my config: `gptlint.config.mjs`:
```
export default [
{
"files":…
-
### Description of the bug:
The following code:
```typescript
var test = new KubeNamespace(this, "test", {
metadata: {
name: "test"
}
})
test.addJsonP…
-
I am getting error "Failed to parse output. Returning None" on faithfulness metric for some inputs. This is inconsistent behavior as it is haphazard and sometimes works, sometimes doesn't for the same…
-
源2.0-M32大模型研发团队深入分析当前主流的量化方案,综合评估模型压缩效果和精度损失表现,最终采用了GPTQ量化方法,并采用AutoGPTQ作为量化框架。
---------------------------------------------------------------------------------------------
Model: Yuan2-M32-HF-I…
-
- [ ] async
- [x] less wasteful LLM calls
I'm cooking on the Database stuff right now, and it's clear that there's a few things we can do to make the daily run much more efficient.
The searches…
-
Hi @frdel , thanks for your amazing work on this project.
I have noticed that AutoMemory can be quite inefficient, primarily because it embeds the entire message history to find relevant entries. F…
-
- **I'm submitting a ...**
[x] feature request
- **Summary**
i would love to use ax-llm in [observable](https://observablehq.com/) but sadly the module won't import. :(
https://observabl…
-
Hello,
we have noticed some unexpected behaviors when fine-tuning a llama 3 model on 1 gpu and when fine-tuning the same model on the same data set with 2 gpus in parallel mode. See the attached te…
-
ChatGLM2-6B use multi-batch size by bigdl-llm[xpu] 20231016 on Arc 770 with Xeon.
For 32in/32out,instance=1, rest latency is 20.5ms/token.
For 32in/32out,instance=2, rest latency is 224.5ms/token.…
-
The current evaluation metrics supported by `llm-eval` are robust. However, upon reviewing the documentation, I found that the current repo doesn't account for evaluating model toxicity. Assessing LLM…