-
Since vLLM 0.2.5, we can't even run llama-2 70B 4bit AWQ on 4*A10G anymore, have to use old vLLM. Similar problems even trying to be two 7b models on 80B A100.
For small models, like 7b with 4k to…
-
作者您好,我使用你的开源代码去运行quantization.py,成功生成了5个tflite版本,我选取全整型的hrnet_quant_int_only.tflite,在ubuntu上用edgetpu_compiler optimized/hrnet_quant_int_only.tflite进行编译,尝试使用TPU运行模型。但是我得到了这样的报错
-
### Issue type
Support
### Have you reproduced the bug with TensorFlow Nightly?
Yes
### Source
source
### TensorFlow version
TF 2.11
### Custom code
No
### OS platform and distribution
Linu…
-
This is the (I think correct) behavior in python2:
``` r
Sys.setenv(RETICULATE_PYTHON= Sys.which("python2"))
library(tensorflow)
x [,1] [,2] [,3] [,4]
#> [1,] "1" "2" "3" "4"
#> [2,]…
-
Hi, ELMo team. For TF models deployment in c++ I typically use freezing graphs (via graph_util.convert_variables_to_constants) to const GraphDef and then I have single .pb graphdef-file for applying m…
-
I have a pytorch model that returns a `dict[str, torch.Tensor]` type with 5 keys and tensors and I use nobuco to convert it to tflite.
Conversion works fine, however, the output of the tflite model i…
-
**System information**
Google Colab Notebook with TF 2.3
**Describe the current behavior**
Loading saved model with input `tf.keras.layers.Input(shape=[None], dtype=tf.int64, ragged=True)` is 5-1…
-
GPU-GeForce GTX 1650
driver nvidia- 450
CUDA-10.1
CUDNN-7.6
Tensoflow-2.2.0
python-3.8.3
Ubuntu-18.04
-------------------------------------------------------------------------------------------…
-
### Background Description
`llama_tensor_get_type()` in [src/llama.cpp](https://github.com/ggerganov/llama.cpp/blob/master/src/llama.cpp) is nearly 300 lines of conditions and has a bit of inconsiste…
-
**System information**
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Linux Debian 11
- TensorFlow installed from (source or binary):
Compiled from source
- Tensor…