Open cylee0909 opened 9 months ago
the same error。
You are doing the conversin on Mac Apple Sillicon, I don't think it is supported. You need to convert with CUDA device, and only inference on Apple sillicon with ggml is supported
This is obviously not ideal; would be great if this is adapted for m1+ architectures.
It raise Import Error when execute
python3 qwen_cpp/convert.py xxx
commandraise ImportError( ImportError: This modeling file requires the following packages that were not found in your environment: kernels, flash_attn. Run
pip install kernels flash_attn
And m1 does not support flash_attn . issue