-
报错信息如下所示:
run.pl: job failed, log is in /mnt/d/Work/FunCodec/egs/LibriTTS/text2speech_laura/dump/libritts/test-other/codecs//logdir/inference.1.log
cat: '/mnt/d/Work/FunCodec/egs/LibriTTS/text2speec…
-
Hello!
I’ve been working with TensorFlow Lite models in tract and have come across a model which cannot be loaded. I've compared the operators used in themodel with those mentioned in the [README](…
-
We noticed that lm_eval --model vllm did not work when data_parallel_size > 1 and got `Error: No available node types can fulfill resource request` from Ray. After some research, I believe when `tenso…
-
Any plan to support the latest Qwen2-VL model evaluation?
-
I am heavily using `tf.contrib.data datasets` api for image based tasks. With observations for images (LSUN/celebA) etc being no more than an downloader for these datasets, would it be worthwhile to r…
-
I saw that in your configure files, you have batch size of 100, 200, 128, 256. Does this affect how I use this model to do inference? Do I have to pad my image data, e.g., 1 image to 100 images in ord…
-
(mitbevfusion) gss@Gss:~/Lidar_AI_Solution/CUDA-BEVFusion$ python qat/export-scn.py
Tracing model inference...
> Do inference...
--> SparseConvolutionQunat0[subm] -> Input 0, Output 1
Tracebac…
-
**Description**
Triton does not clear or release GPU memory when there is a pause in inference. In the attached diagrams the same model is being used. It is served via ONNX.
![image (1)](https:…
-
This is more of a question for my understanding. I understand that at training time each sequence is of fixed length (and not padded) so the attention mask can be constructed using a triangular matrix…
-
### Description
The inference API supports client side batching by leveraging the `input` array field. External services implement different limits for batched requests. Cohere limits the text to [96…