-
We're exploring various optimizations available in the [Diffusers library](https://huggingface.co/docs/diffusers/main/en/optimization/opt_overview) to enhance VRAM usage and inference speed. @titan-no…
-
### Community Note
* Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the…
-
Hi, thanks for your great work.
I am following the instructions to install and run the test scripts.
I tried two systems, one with 4xA100 40G, the other with 4xA100 80G.
I use the following…
-
## Overall
GTC 2025 will be held on March 17- 20, 2025 in person in San Jose. The NVIDIA team wants us to share our works there.
At that time, hope we integrate Jade to Jan which is powered by Corte…
-
### Specification
When we import files, the runtime must import all the related files before it can begin program execution. As such, if large files referencing other files are being imported at th…
-
## Description
ignore_eos_token is commonly used additional parameter to help standardize LLM benchmarks by forcing the requests to generate a consistent output seq len.
-Will this change the c…
-
Hello,
I'm considering using MeTTa for a conversational AI application and have some questions about its performance with large datasets.
In the OpenCog Atomspace Metagraphs paper, it's mentione…
-
Currently, it only supports OpenAI.
-
Congratulations on your new results in https://www.zama.ai/post/making-fhe-faster-for-ml-beating-our-previous-paper-benchmarks-with-concrete-ml ! We wonder if more details about the underlying improve…
-
### Motivation
This is an interesting blog post [FireAttention V2: 12x faster to make Long Contexts practical for Online Inference](https://fireworks.ai/blog/fireattention-v2-long-context-inference…