Open xiaoshingshing2 opened 3 months ago
This is the log of the training process and adapter_config with --dora_simple False
trainer_state.json
adapter_config.json
Did you install all the packages following requirements.txt?
Hi, I donot install bitsandbytes
and my pytorch version is 2.1.0. The transformers
package is installed with pip install transformers==4.36.0
. Other packages are the same as requirements.txt.
Will that hurt the performance?
The packages I use are listed below:
Package Version
accelerate 0.25.0 aiofiles 23.2.1 aiohttp 3.9.5 aiosignal 1.3.1 altair 5.3.0 annotated-types 0.7.0 anyio 4.4.0 appdirs 1.4.4 asttokens 2.4.1 async-timeout 4.0.3 attrs 23.2.0 black 23.12.0 certifi 2024.6.2 charset-normalizer 3.3.2 click 8.1.7 cmake 3.29.6 contourpy 1.2.1 cycler 0.12.1 datasets 2.15.0 decorator 5.1.1 dill 0.3.7 dnspython 2.6.1 email_validator 2.2.0 exceptiongroup 1.2.1 executing 2.0.1 fastapi 0.111.0 fastapi-cli 0.0.4 ffmpy 0.3.2 filelock 3.15.4 fire 0.5.0 fonttools 4.53.0 frozenlist 1.4.1 fsspec 2023.10.0 gradio 4.9.0 gradio_client 0.7.2 h11 0.14.0 httpcore 1.0.5 httptools 0.6.1 httpx 0.27.0 huggingface-hub 0.23.4 idna 3.7 importlib_resources 6.4.0 ipython 8.25.0 jedi 0.19.1 Jinja2 3.1.4 jsonschema 4.22.0 jsonschema-specifications 2023.12.1 kiwisolver 1.4.5 lit 18.1.8 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.9.0 matplotlib-inline 0.1.7 mdurl 0.1.2 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.15 mypy-extensions 1.0.0 networkx 3.3 numpy 1.26.4 orjson 3.10.5 packaging 24.1 pandas 2.2.2 parso 0.8.4 pathspec 0.12.1 pexpect 4.9.0 pillow 10.3.0 pip 24.0 platformdirs 4.2.2 prompt_toolkit 3.0.47 protobuf 5.27.2 psutil 6.0.0 ptyprocess 0.7.0 pure-eval 0.2.2 pyarrow 16.1.0 pyarrow-hotfix 0.6 pydantic 2.7.4 pydantic_core 2.18.4 pydub 0.25.1 Pygments 2.18.0 pyparsing 3.1.2 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-multipart 0.0.9 pytorch-triton-rocm 2.1.0 pytz 2024.1 PyYAML 6.0.1 referencing 0.35.1 regex 2024.5.15 requests 2.32.3 rich 13.7.1 rpds-py 0.18.1 safetensors 0.4.3 scipy 1.11.4 semantic-version 2.10.0 sentencepiece 0.1.99 setuptools 69.5.1 shellingham 1.5.4 six 1.16.0 sniffio 1.3.1 stack-data 0.6.3 starlette 0.37.2 sympy 1.12.1 termcolor 2.4.0 tokenize-rt 5.2.0 tokenizers 0.15.2 tomli 2.0.1 tomlkit 0.12.0 toolz 0.12.1 torch 2.1.0+rocm5.6 tqdm 4.66.4 traitlets 5.14.3 transformers 4.36.0 typer 0.12.3 typing_extensions 4.12.2 tzdata 2024.1 ujson 5.10.0 urllib3 2.2.2 uvicorn 0.30.1 uvloop 0.19.0 watchfiles 0.22.0 wcwidth 0.2.13 websockets 11.0.3 wheel 0.43.0 xxhash 3.4.1 yarl 1.9.4
I am trying to use exactly the same package as in requirements.txt, and will update my results when the finetuning and testing process finish.
I use exactly the packages in requirements.txt, and the results are:
with dora simple:
BoolQ | PIQA | SIQA | HellaSwag | WinoGrande | ARC-e | ARC-c | OBQA | Average |
---|---|---|---|---|---|---|---|---|
69.1 | 82.8 | 78.8 | 86.2 | 81.0 | 82.1 | 66.1 | 79.2 | 78.2 |
without dora simple:
BoolQ | PIQA | SIQA | HellaSwag | WinoGrande | ARC-e | ARC-c | OBQA | Average |
---|---|---|---|---|---|---|---|---|
68.7 | 83.3 | 79.4 | 85.5 | 81.3 | 80.8 | 66.0 | 78.8 | 78.0 |
which has 0.4% average accuracy gap between the results reported in readme.
New updates:
I use exactly the packages in requirements.txt, and the results on r=[8,16] still have a large gap with the results reported in readme, while the results on r=[4,64] are better and the result on r=[32] is roughly equal.
Average acc:
r | original | reproduce |
---|---|---|
4 | 61.9 | 65.2 |
8 | 77.9 | 72.5 |
16 | 77.5 | 62.7 |
32 | 78.4 | 78.2 |
64 | 76.8 | 77.9 |
Is this a normal result?
@xiaoshingshing2 I have encountered a similar issue. Did you manage to resolve it? Could you provide your package versions?
@xiaoshingshing2 I have encountered a similar issue. Did you manage to resolve it? Could you provide your package versions?
In the latest update, I used exactly the packages in requirements.txt with the same versions. I still have the problems.
First of all, using the official checkpoint is ok. The results on BoolQ is 69.63 while the official result is 69.7.
However, when I try to reproduce the results, I encounter two problems.
The first is about Llama7B dora_r32 without dora_simple. I change three commands in
llama_7B_Dora.sh
. For example, the micro_batch_size from 16 to 4, learning_rate from 2e-4 to 1e-4, and add--dora_simple False
to avoid using dora_simple. I use the command linesh llama_7B_Dora.sh 32 64 ./finetuned_result/dora_r32 0
, and the results arewhich are worse than the official results.
The second is that when I delete the
--dora_simple False
to accelerate the training process with dora_simple, the results are even worse: