-
hi there, I am using a 8Gen3(Xiaomi14 Pro 68GB/s bw) and following the Android Cross Compilation Guidance Option.1: Use Prebuilt Kernels guide to test llama-2-7b-4bit token generation performance.
it…
-
### Describe the bug
Hello diffusers team !
I face an annoying issue since I upgraded the diffusers version to 0.27.X
The first call (and only the first) of pipeline(...) takes now a lot of time …
-
We should get docs up in general, but @lukasheinrich pointed out that we should probably be tracking pylhe citations as well.
This is what I have just from `https://www.google.com/search?q=pylhe+si…
-
The easiest place to see the excess memory usage is with stable_diffusion, e.g.
```
python shark/examples/shark_inference/stable_diffusion/main.py --precision=fp32
```
Where we see ~24 GB of mem…
-
Hello all. Just thought I'd post a question about Flash Attention 2 here:
[https://github.com/Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention)
Apparently it's making big…
-
Following the [2017 TensorFlow Dev Summit](https://events.withgoogle.com/tensorflow-dev-summit/#content), here is an outline of Edward going forward at least for Spring 2017. Of course, comments are …
-
I am using TorchServe to potentially serve a model from MMOCR (https://github.com/open-mmlab/mmocr), and I have several questions:
1. I tried to do inference on hundreds of images together using batc…
-
A very useful tool in order to understand model performance beyond obtaining loss: Actually show what are the predictions.
It'd be very useful to be able to "see" the output of the model during eva…
-
The paper "[BOHB: Robust and Efficient Hyperparameter Optimization at Scale][1]" includes an interesting parallelization technique for Bayesian sampling in a Hyperband implementation. In Section 4.2 t…
-
paddlepaddle2.1.1-cpu版 paddleclas
用的模型和数据为:https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/zh_CN/tutorials/quick_start_recognition.md中给出的
识别一张图在:
1、win10下(pc机):
Inference: 2525…