-
when I run build_win.bat, I get a deepspeed whl file finnally. The matter seems to have been resolved. However, when I ran the program, the following issue occurred
File "D:\anaconda3\envs\llm\li…
-
https://github.com/llm-jp/llm-jp-eval/pull/115 の `offline_inference_example.py` を参考にFastGenでオフライン推論処理を実装する。
-
> closed, confirmed that it was fixed in 0.11.0.dev2024060400
Hi @hijkzzz,I meet the same problem for MOE (including 8x22B and 8x7B) fp8 quantization in H20, even after upgrading to 0…
-
### 问题描述 Issue Description
源码编译过程中使用命令`time cmake .. -DPY_VERSION=3.10 -DWITH_GPU=ON -DWITH_TESTING=ON`遇到如下问题:
```bash
-- commit: f8a40a7d3e
-- branch: develop
/home/sun/anaconda3/envs/paddle-d…
-
It seems that the fp6_llm repo only includes the kernel `weight_matrix_dequant_fp_eXmY_cpu`, which dequantizes fp6 data to fp16 format, but it lacks the kernel to quantize fp16 data to fp6. Could you …
-
### System Info
CPU Architecture: x86_64
CPU/Host memory size: 1024Gi (1.0Ti)
GPU properties:
GPU name: NVIDIA GeForce RTX 4090
GPU mem size: 24Gb…
-
Errors as the following:
(.venv) (base) pengxiong@PENGMacPro PDF-Extract-Kit % python pdf_extract.py --pdf demo/demo1.pdf
[2024-07-19 20:17:51,713] [ ERROR] check_version.py:39 - Error fetching …
-
### Your current environment
```text
Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A
OS: Ubuntu …
-
```
(TinyChatEngine) zhef@zhef:~/TinyChatEngine/llm$ make chat -j
CUDA is available!
src/Generate.cc src/LLaMATokenizer.cc src/OPTGenerate.cc src/OPTTokenizer.cc src/utils.cc src/nn_modules/Fp32OPT…
-
### Your current environment
```text
Versions of relevant libraries:
[pip3] flashinfer==0.0.9+cu121torch2.3
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] sentence-transformers==3.0…