如何使用4bit的模型运行gradio_demo_composition.py

使用一张V100运行这个文件会显存不足，因此我想使用4bit模型来进行这个任务。我将gradio_demo_composition.py中的170行修改为 self.model = InternLMXComposer2QForCausalLM.from_quantized(code_path, device_map='auto', trust_remote_code=True).eval() 但是运行时报错如下： Traceback (most recent call last): File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/gradio/queueing.py", line 489, in call_prediction output = await route_utils.call_process_api( File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/gradio/route_utils.py", line 232, in call_process_api output = await app.get_blocks().process_api( File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/gradio/blocks.py", line 1561, in process_api result = await self.call_function( File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/gradio/blocks.py", line 1191, in call_function prediction = await utils.async_iteration(iterator) File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/gradio/utils.py", line 519, in async_iteration return await iterator.__anext__() File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/gradio/utils.py", line 512, in __anext__ return await anyio.to_thread.run_sync( File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread return await future File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run result = context.run(func, *args) File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/gradio/utils.py", line 495, in run_sync_iterator_async return next(iterator) File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/gradio/utils.py", line 666, in gen_wrapper yield from f(*args, **kwargs) File "/data/intern/lihaowenbj/code/chatglm/InternLM/demo/InternLM-XComposer/examples/gradio_demo_composition_4bit.py", line 509, in generate_article input_embeds = self.model.model.tok_embeddings(input_ids.cuda()) File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__ raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'InternLMXComposer2ForCausalLM' object has no attribute 'tok_embeddings' [UNUSED_TOKEN_146]user 给定文章"" 根据上述文章，选择适合插入图像的6行[UNUSED_TOKEN_145] [UNUSED_TOKEN_146]assistant 适合插入图像的行是 Traceback (most recent call last): File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/gradio/queueing.py", line 489, in call_prediction output = await route_utils.call_process_api( File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/gradio/route_utils.py", line 232, in call_process_api output = await app.get_blocks().process_api( File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/gradio/blocks.py", line 1561, in process_api result = await self.call_function( File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/gradio/blocks.py", line 1179, in call_function prediction = await anyio.to_thread.run_sync( File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread return await future File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run result = context.run(func, *args) File "/data/intern/lihaowenbj/anaconda3/envs/internlmx/lib/python3.10/site-packages/gradio/utils.py", line 678, in wrapper response = f(*args, **kwargs) File "/data/intern/lihaowenbj/code/chatglm/InternLM/demo/InternLM-XComposer/examples/gradio_demo_composition_4bit.py", line 580, in insert_images inject_text, locs = self.generate_loc(idx_text_sections, upimages, img_num) File "/data/intern/lihaowenbj/code/chatglm/InternLM/demo/InternLM-XComposer/examples/gradio_demo_composition_4bit.py", line 280, in generate_loc output_text = self.generate(instruction, True, 1, 200, 1.005) File "/data/intern/lihaowenbj/code/chatglm/InternLM/demo/InternLM-XComposer/examples/gradio_demo_composition_4bit.py", line 219, in generate generate = self.model.generate(input_ids.cuda(), TypeError: BaseGPTQForCausalLM.generate() takes 1 positional argument but 2 were given

请问应该如何修改代码？

InternLM / InternLM-XComposer

如何使用4bit的模型运行gradio_demo_composition.py #238