-
https://blog.cloudflare.com/javascript-native-rpc
If you instrument your worker, and it tries to call a method on a bound service using the new RPC style, you get a `TypeError: Illegal invocation` …
-
Using FLASHINFER to start VLLM reported an error, enabling -- quantification gptq -- kv cache dtype fp8_e5m2
Start command:
python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 78…
-
KV 不同节点同步需要时间,适合常读少写的场景。
https://developers.cloudflare.com/workers/learning/how-kv-works/
>KV achieves this performance by being eventually-consistent. Changes are usually immediately visible in…
-
### System Info
- tensorrtllm_backend built using Dockerfile.trt_llm_backend
- main branch tesnorrt llm (0.13.0.dev20240813000)
- 8xH100 SXM
- Driver Version: 535.129.03
- CUDA Version: 12.5
…
-
I'm not sure it is already supported in workers-rs. But hopefully there would be an equivalent binding to `blockConcurrencyWhile` in JavaScript. Something like `async fn new(...) -> Self`
I use a p…
-
### System Info
GPU Name: 8 * H20
TensorRT-LLM : 0.12.0.dev2024080600
NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.4
### Who can help?
_No response_
### Inform…
-
### Motivation.
I am one of the authors of the paper Stay On Topic with Classifier-Free Guidance ( https://openreview.net/forum?id=RiM3cl9MdK¬eId=s1BXLL1YZD ) who has been nominated as ICML'24 Spo…
-
$ python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8001 --model Qwen1.5-14B-Chat-AWQ --tensor-parallel-size 2 --quantization awq --trust-remote-code --dtype half
INFO 02-26 1…
-
- https://www.w3.org/TR/service-workers/#cache-objects
- https://html.spec.whatwg.org/multipage/webstorage.html
I think those APIs should be included for supporting local KV features.
-
Hi everyone and firstly thanks for Wildebeest. I much appreciate it already.
I did run into a problem during my deploy though. All actions above this one - "download Terraform state" - are successf…