-
Hello,
I built a simple langchain app using `ConversationalRetrievalChain` and `langserve`.
It is working great for its `invoke` API. However when it comes to `stream` API, it returns entire an…
-
I am using Langserve and Langchain with huggingface pipelines with a Streamer object.
If I use TextStreamer obj from huggingface, I can see the stream in stdout.
I read that I might need to use…
-
请教一下:
训练加载的数据量比较大,300万行左右,大约需要3小时以上。加载500万行,等了差不多一晚上了,还没有完成。
最终可能需要加载超过1000万的数据进行训练,担心加载不进去。
有没有什么好的方法,一边加载一边训练,避免一次加载全量的数据,导致数据量过大,导致内存溢出。
另外,我看代码里面有流式的参数,从代码里面没有看出有太多的差别,只是少了并行处理“num_pr…
-
### Question Validation
- [X] I have searched both the documentation and discord for an answer.
### Question
Hi, I want to know how to count tokens (Embedding Tokens, LLM Prompt Tokens, LLM Complet…
-
Hi,
for the non-streaming method `GetChatMessageContentsAsync` I get the usage like this:
```
private decimal CalculateCosts(IReadOnlyList result)
{
var costs = 0m;
if (result[0].M…
-
https://docs.khoj.dev/get-started/setup
Where is the FastAPI initiated / called ?
What are the endpoints I can use to call from outside?
I want to use its capability as backend for my other…
-
I'm trying to serve a Llama-2-70b-chat-hf model using Triton inferencer server with TRT-LLM engine. The script I used is `tools/inflight_batcher_llm/end_to_end_streaming_client.py`:
```
python3 to…
-
### Describe the bug
use datasets streaming mode in trainer ddp mode cause memory leak
### Steps to reproduce the bug
import os
import time
import datetime
import sys
import numpy as np…
-
**Motivation:** Some HTML parsers (e.g. parse5 and our internal parser) provide a streaming mode in which tokenizer works as if it's executed together with tree construction algorithm, and so tokenize…
-
## Environment
- OS: Ubuntu 22.04
## To reproduce
Steps to reproduce the behavior:
When using the `StreamingDataloader` (or the vanilla pytorch `Dataloader`) with `num_workers>0`, the proces…