Open ly19970621 opened 7 months ago
(1) i have fix the file examples/utils.py
as following:
device_map = {
'vit': 0,
'vision_proj': 0,
'model.tok_embeddings': 0,
'plora_glb_GN': num_gpus - 1,
'plora_sub_GN':num_gpus - 1,
'model.norm': num_gpus - 1,
'output': num_gpus - 1,
}
it works for seperating computing into differ gpus. @ly19970621 (2) InternLM-XComposer2-4KHD-7B infer costs too much GPU rams up to almost 80G. i have a A800 and it can only be the infer server, that's too scary!
Package Version
------------------------- ------------
accelerate 0.29.2
addict 2.4.0
aiofiles 23.2.1
aiohttp 3.9.4
aiosignal 1.3.1
aliyun-python-sdk-core 2.15.1
aliyun-python-sdk-kms 2.16.2
altair 5.3.0
annotated-types 0.6.0
anyio 4.3.0
async-timeout 4.0.3
attrs 23.2.0
auto_gptq 0.7.1
certifi 2022.12.7
cffi 1.16.0
charset-normalizer 2.1.1
click 8.1.7
cmake 3.25.0
contourpy 1.2.1
crcmod 1.7
cryptography 42.0.5
cycler 0.12.1
datasets 2.18.0
deepspeed 0.14.1
dill 0.3.8
einops 0.7.0
exceptiongroup 1.2.0
fastapi 0.110.1
ffmpy 0.3.2
filelock 3.9.0
flash_attn 2.5.7
fonttools 4.51.0
frozenlist 1.4.1
fsspec 2024.2.0
gast 0.5.4
gekko 1.1.1
gradio 4.13.0
gradio_client 0.8.0
h11 0.14.0
hjson 3.1.0
httpcore 1.0.5
httpx 0.27.0
huggingface-hub 0.22.2
idna 3.4
importlib_metadata 7.1.0
importlib_resources 6.4.0
Jinja2 3.1.2
jmespath 0.10.0
jsonschema 4.21.1
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
lit 15.0.7
markdown-it-py 3.0.0
markdown2 2.4.10
MarkupSafe 2.1.3
matplotlib 3.8.4
mdurl 0.1.2
modelscope 1.13.3
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
networkx 3.2.1
ninja 1.11.1.1
numpy 1.24.1
orjson 3.10.1
oss2 2.18.4
packaging 24.0
pandas 2.2.2
peft 0.10.0
pillow 10.2.0
pip 24.0
platformdirs 4.2.0
psutil 5.9.8
py-cpuinfo 9.0.0
pyarrow 15.0.2
pyarrow-hotfix 0.6
pycparser 2.22
pycryptodome 3.20.0
pydantic 2.7.0
pydantic_core 2.18.1
pydub 0.25.1
Pygments 2.17.2
pynvml 11.5.0
pyparsing 3.1.2
python-dateutil 2.9.0.post0
python-multipart 0.0.9
pytz 2024.1
PyYAML 6.0.1
referencing 0.34.0
regex 2023.12.25
requests 2.28.1
rich 13.7.1
rouge 1.0.1
rpds-py 0.18.0
safetensors 0.4.3
scipy 1.13.0
semantic-version 2.10.0
sentencepiece 0.1.99
setuptools 69.5.1
shellingham 1.5.4
simplejson 3.19.2
six 1.16.0
sniffio 1.3.1
sortedcontainers 2.4.0
starlette 0.37.2
sympy 1.12
timm 0.4.12
tokenizers 0.13.3
tomli 2.0.1
tomlkit 0.12.0
toolz 0.12.1
torch 2.0.1+cu117
torchaudio 2.0.2+cu117
torchvision 0.15.2+cu117
tqdm 4.66.2
transformers 4.33.2
triton 2.0.0
typer 0.12.3
typing_extensions 4.8.0
tzdata 2024.1
urllib3 1.26.13
uvicorn 0.29.0
websockets 11.0.3
wheel 0.43.0
XlsxWriter 3.1.2
xxhash 3.4.1
yapf 0.40.2
yarl 1.9.4
zipp 3.18.1
- code scripts
import sys sys.path.insert(0, '.') sys.path.insert(0, '..') import argparse import torch from modelscope import snapshot_download, AutoModel, AutoTokenizer from examples.utils import auto_configure_device_map
torch.set_grad_enabled(False)
parser = argparse.ArgumentParser() parser.add_argument("--num_gpus", default=1, type=int) parser.add_argument("--dtype", default='fp16', type=str) args = parser.parse_args()
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b') model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).cuda().eval() tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
if args.dtype == 'fp16': model.half().cuda() elif args.dtype == 'fp32': model.cuda()
if args.num_gpus > 1: from accelerate import dispatch_model device_map = auto_configure_device_map(args.num_gpus) model = dispatch_model(model, device_map=device_map)
###############
###############
query = '
print("*"10) print("-------> first round") print(response) print("*"10)
###############
############### query1 = 'what is the detailed explanation of the third part.' with torch.cuda.amp.autocast(): response, _ = model.chat(tokenizer, query=query1, image=image, hd_num=55, history=his, do_sample=False, num_beams=3)
print("*"10) print("-------> second round") print(response) print("*"10)
so is there any thing wrong for me to use a 7b model like this ?
can you provide a script for qualified InternLM-XComposer2-4KHD-7B (int4) ?@myownskyW7
OOM error on one 48G memory card.
机器环境:4 * RTX 4090 运行命令:CUDA_VISIBLE_DEVICES=0,1 python examples/example_chat.py --num_gpus 2 出现如下错误:
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.44s/it]
Some weights of InternLMXComposer2ForCausalLM were not initialized from the model checkpoint at /home/ai_group/model/internlm-xcomposer2/Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b and are newly initialized: ['vit.vision_tower.vision_model.post_layernorm.bias', 'vit.vision_tower.vision_model.post_layernorm.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "/home/ai_group/liuy026/multi_modality/InternLM-XComposer/examples/example_chat.py", line 26, in <module> model = dispatch_model(model, device_map=device_map)
File "/home/ai_group/anaconda3/envs/liuy026-py310/lib/python3.10/site-packages/accelerate/big_modeling.py", line 351, in dispatch_model check_device_map(model, device_map)
File "/home/ai_group/anaconda3/envs/liuy026-py310/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 1393, in check_device_map raise ValueError(
ValueError: The device_map provided does not give any device for the following parameters: plora_glb_GN, plora_sub_GN