Open Sumanai opened 1 year ago
GALACTICA support would be nice as well. Can FlexGen be generalized to all OPTForCausalLM models?
Unfortunately, the attempt to add GALACTICA in the same way failed. The problem seems to be the lack of handling parameters like attention_dropout, but this is purely a guess. After loading and at the first generation an error appears in the logs (I removed the repeating parts):
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1141: block: [223,0,0], thread: [29,0,0 File "c:\users\username\flexgen\flexgen\flex_opt.py", line 873, in generate
] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu self.generation_loop_overlap_single_batch()
File "c:\users\username\flexgen\flexgen\flex_opt.py", line 1013, in generation_loop_overlap_single_batch
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1141: block: [223,0,0 self.sync()
], thread: [31,0,0 File "c:\users\username\flexgen\flexgen\flex_opt.py", line 782, in sync
] Assertion `srcIndex < srcSelectDimSize torch.cuda.synchronize()
` failed.
File "C:\Users\username\AppData\Roaming\Python\Python310\site-packages\torch\cuda\__init__.py", line 566, in synchronize
return torch._C._cuda_synchronize()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
If we can solve this problem, we can remove some of the hardcode and let you load any model based on OPTForCausalLM.
GALACTICA support would be cool! I think FlexGen can be generalized to OPTForCausalLM very easily. The error reported by @Sumanai looks wired to me. Need more investigation.
Is this just partial support?: https://github.com/FMInference/FlexGen/pull/83
I have tried loading galactica-30b and I got this error:
opt_config.py", line 118, in get_opt_config
raise ValueError(f"Invalid model name: {name}")
ValueError: Invalid model name: galactica-30b
Not sure if that commit has already made it to flexgen==0.1.7
or if it is enough to load GALACTICA.
I got a similar error to @Sumanai when using Erebus-13b on a 3080 when the text length gets long -
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [6,0,0], thread: [91,0,0] Assertion
srcIndex < srcSelectDimSizefailed. ../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [6,0,0], thread: [92,0,0] Assertion
srcIndex < srcSelectDimSizefailed. ../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [6,0,0], thread: [93,0,0] Assertion
srcIndex < srcSelectDimSizefailed. ../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [6,0,0], thread: [94,0,0] Assertion
srcIndex < srcSelectDimSizefailed. ../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [6,0,0], thread: [95,0,0] Assertion
srcIndex < srcSelectDimSizefailed.
Tried changing policy parameters but nothing seems to work.
I managed to make FlexGen work for Galactica-1.3b model by changing opt_config.py, flex_opt.py and tokenizer_config.json. @oobabooga 's Webui can successfully load the model and generate text using it. Vram use decreased as expected. However, all the text generated become gibberish (it's not due to parameter preset). Maybe someone would be interested in taking a closer look? I can upload the files I modified. I am not really a programming or ML expert...
@fgdfgfthgr-fox can you create a fork of https://github.com/FMInference/FlexGen with your changes?
@fgdfgfthgr-fox can you create a fork of https://github.com/FMInference/FlexGen with your changes?
@oobabooga https://github.com/fgdfgfthgr-fox/FlexGen---galactica-support Is this what you want?
@Sumanai How did you get Erebus working?
@Sumanai How did you get Erebus working?
You can see my dirty edits in my repository. https://github.com/Sumanai/FlexGen/tree/erebus I hope this code will help explorers in adding Galactica support.
Hello! I propose to add support for the Erebus family of models, these are finetune models of the original OPT. I looked at the code, and the support is not too difficult to add, and I was able to run a couple of models without major code modification. I can provide PR if needed. The link to one of the models, there are also the rest. https://huggingface.co/KoboldAI/OPT-2.7B-Erebus