FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs
MIT License
7.01k stars 512 forks source link

NameError: name 'index_first_axis' is not defined #746

Open Stangerine opened 5 months ago

Stangerine commented 5 months ago

Can anyone help me, thanks?

staoxiao commented 5 months ago

Can you share the command you used?

Stangerine commented 5 months ago

Can you share the command you used?

torchrun --nproc_per_node 1 \ -m FlagEmbedding.llm_reranker.finetune_for_layerwise.run \ --output_dir /opt/data/private/zzq/models/bge-reranker-v2-minicpm-layerwise-finetuned \ --model_name_or_path /opt/data/private/zzq/models/bge-reranker-v2-minicpm-layerwise \ --train_data /opt/data/private/zzq/dataset/train_data/2.0/finetune_data_for_reranker_2.0.jsonl \ --learning_rate 2e-4 \ --num_train_epochs 1 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 16 \ --dataloader_drop_last True \ --query_max_len 512 \ --passage_max_len 8192 \ --train_group_size 16 \ --logging_steps 1 \ --save_steps 2000 \ --save_total_limit 50 \ --ddp_find_unused_parameters False \ --gradient_checkpointing \ --deepspeed /opt/data/private/zzq/train/stage1.json \ --warmup_ratio 0.1 \ --bf16 \ --use_lora True \ --lora_rank 32 \ --lora_alpha 64 \ --use_flash_attn True \ --target_modules q_proj k_proj v_proj o_proj \ --start_layer 8 \ --head_multi True \ --head_type simple \ --lora_extra_parameters linear_head \ --finetune_type from_finetuned_model

staoxiao commented 5 months ago

@545999961, please take a look at this issue when you are convenient.

Stangerine commented 5 months ago

@545999961, please take a look at this issue when you are convenient.

Thank you!my friend

545999961 commented 5 months ago

Can you provide specific error information? I want to know where the error occurred.

Stangerine commented 5 months ago

Can you provide specific error information? I want to know where the error occurred.

image

545999961 commented 5 months ago

can you provide your version of transformers and flash-attn

Stangerine commented 4 months ago

can you provide your version of transformers and flash-attn Thank you, it has been solved. I changed the versions of flash-attn and torch.

Stangerine commented 4 months ago

can you provide your version of transformers and flash-attn

warnings.warn( /root/anaconda3/envs/zzq_kdd/lib/python3.9/site-packages/transformers/tokenization_utils_base.py:2692: UserWarning:max_lengthis ignored whenpadding=Trueand there is no truncation strategy. To pad to max length, usepadding='max_length'. warnings.warn( /root/anaconda3/envs/zzq_kdd/lib/python3.9/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. Friends, I would like to ask whether these two warnings have any impact on training reranker?

Stangerine commented 4 months ago

accelerate 0.29.1 addict 2.4.0 aiohttp 3.9.3 aiolimiter 1.1.0 aiosignal 1.3.1 aliyun-python-sdk-core 2.15.0 aliyun-python-sdk-kms 2.16.2 annotated-types 0.6.0 antlr4-python3-runtime 4.9.3 anyio 4.3.0 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 asgiref 3.8.1 async-timeout 4.0.3 attrs 23.2.0 auto_gptq 0.7.1 azure-core 1.30.1 azure-storage-blob 12.19.1 backoff 2.2.1 bcrypt 4.1.2 beautifulsoup4 4.12.3 blinker 1.7.0 blis 0.7.11 build 1.2.1 cachetools 5.3.3 catalogue 2.0.10 certifi 2024.2.2 cffi 1.16.0 charset-normalizer 3.3.2 chroma-hnswlib 0.7.3 chromadb 0.4.24 click 8.1.7 cloudpathlib 0.16.0 cohere 5.3.3 coloredlogs 15.0.1 confection 0.1.4 crcmod 1.7 cryptography 42.0.5 cycler 0.11.0 cymem 2.0.8 dataclasses-json 0.6.4 datasets 2.18.0 DBUtils 3.1.0 deepspeed 0.14.2 Deprecated 1.2.14 dill 0.3.8 dirtyjson 1.0.8 distro 1.9.0 docker 6.1.3 docker-compose 1.29.2 dockerpty 0.4.1 docopt 0.6.2 dpr 0.2.1 einops 0.7.0 en-core-web-sm 3.7.1 environs 9.5.0 exceptiongroup 1.2.0 faiss-gpu 1.7.2 fastapi 0.110.2 fastavro 1.9.4 filelock 3.12.2 FlagEmbedding 1.2.9 flash-attention 1.0.0 flash-attn 2.5.7 Flask 3.0.3 flatbuffers 24.3.25 fonttools 4.38.0 frozenlist 1.4.1 fsspec 2024.2.0 gast 0.5.4 gekko 1.1.1 google-auth 2.29.0 googleapis-common-protos 1.63.0 greenlet 3.0.3 grpcio 1.60.0 h11 0.14.0 hjson 3.1.0 httpcore 1.0.5 httptools 0.6.1 httpx 0.27.0 httpx-sse 0.4.0 huggingface-hub 0.22.2 humanfriendly 10.0 hybrid 1.2.3 idna 3.7 importlib-metadata 6.7.0 importlib-resources 5.12.0 isodate 0.6.1 itsdangerous 2.1.2 jieba 0.42.1 Jinja2 3.1.3 jmespath 0.10.0 joblib 1.3.2 JPype1 1.5.0 jsonlines 3.1.0 jsonpatch 1.33 jsonpointer 2.4 jsonschema 3.2.0 kiwisolver 1.4.5 konlpy 0.6.0 kubernetes 29.0.0 langchain 0.1.16 langchain-chroma 0.1.0 langchain-community 0.0.32 langchain-core 0.1.42 langchain-openai 0.1.3 langchain-text-splitters 0.0.1 langcodes 3.3.0 langsmith 0.1.46 llama-index 0.10.29 llama-index-agent-openai 0.2.2 llama-index-cli 0.1.11 llama-index-core 0.10.29 llama-index-embeddings-openai 0.1.7 llama-index-indices-managed-llama-cloud 0.1.5 llama-index-legacy 0.9.48 llama-index-llms-openai 0.1.15 llama-index-multi-modal-llms-openai 0.1.5 llama-index-postprocessor-flag-embedding-reranker 0.1.2 llama-index-program-openai 0.1.5 llama-index-question-gen-openai 0.1.3 llama-index-readers-file 0.1.16 llama-index-readers-llama-parse 0.1.4 llama-parse 0.4.0 llamaindex-py-client 0.1.18 LM-Cocktail 0.0.4 lxml 5.2.1 MarkupSafe 2.1.5 marshmallow 3.21.1 matplotlib 3.5.3 mecab-python3 1.0.8 milvus-model 0.2.0 minio 7.2.5 mmh3 4.1.0 modelscope 1.13.3 monotonic 1.6 mpmath 1.3.0 msgspec 0.18.6 multidict 6.0.5 multiprocess 0.70.16 murmurhash 1.0.10 mypy-extensions 1.0.0 nest-asyncio 1.6.0 networkx 3.2.1 ninja 1.11.1.1 nltk 3.8.1 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.19.3 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.1.105 oauthlib 3.2.2 omegaconf 2.3.0 onnxruntime 1.17.3 openai 1.17.0 opentelemetry-api 1.24.0 opentelemetry-exporter-otlp-proto-common 1.24.0 opentelemetry-exporter-otlp-proto-grpc 1.24.0 opentelemetry-instrumentation 0.45b0 opentelemetry-instrumentation-asgi 0.45b0 opentelemetry-instrumentation-fastapi 0.45b0 opentelemetry-proto 1.24.0 opentelemetry-sdk 1.24.0 opentelemetry-semantic-conventions 0.45b0 opentelemetry-util-http 0.45b0 optimum 1.19.1 orjson 3.10.0 oss2 2.18.4 overrides 7.7.0 packaging 24.0 pandas 2.2.1 paramiko 3.4.0 peft 0.8.0 Pillow 9.5.0 pip 24.0 pip-review 1.3.0 pipreqs 0.4.13 platformdirs 4.2.0 posthog 3.5.0 preshed 3.0.9 protobuf 3.20.0 psutil 5.9.8 pulsar-client 3.5.0 py-cpuinfo 9.0.0 pyarrow 15.0.2 pyarrow-hotfix 0.6 pyasn1 0.6.0 pyasn1_modules 0.4.0 pycparser 2.22 pycryptodome 3.20.0 pydantic 1.10.15 pydantic_core 2.16.3 pymilvus 2.4.0 PyMuPDF 1.24.1 PyMuPDFb 1.24.1 PyNaCl 1.5.0 pynvml 11.5.0 pyparsing 3.1.2 pypdf 4.2.0 PyPika 0.48.9 pyproject_hooks 1.0.0 pyrsistent 0.20.0 python-dateutil 2.9.0.post0 python-dotenv 0.21.1 pytz 2024.1 PyYAML 6.0.1 rank-bm25 0.2.2 regex 2023.12.25 requests 2.31.0 requests-oauthlib 2.0.0 reranker 0.2.3 rouge 1.0.1 rsa 4.9 safetensors 0.4.2 scikit-learn 1.4.1.post1 scipy 1.13.0 sentence-transformers 2.6.1 sentencepiece 0.2.0 setuptools 68.2.2 simplejson 3.19.2 six 1.16.0 smart-open 6.4.0 sniffio 1.3.1 sortedcontainers 2.4.0 soupsieve 2.5 spacy 3.7.4 spacy-legacy 3.0.12 spacy-loggers 1.0.5 SQLAlchemy 2.0.29 srsly 2.4.8 starlette 0.37.2 strictjson 4.1.0 striprtf 0.0.26 sympy 1.12 tenacity 8.2.3 texttable 1.7.0 thinc 8.2.3 threadpoolctl 3.4.0 tiktoken 0.6.0 tokenizers 0.19.1 tomli 2.0.1 torch 2.2.2 tornado 6.4 tqdm 4.66.2 transformers 4.40.1 triton 2.2.0 typer 0.9.4 types-requests 2.31.0.20240406 typing_extensions 4.11.0 typing-inspect 0.9.0 tzdata 2024.1 ujson 5.9.0 unidic-lite 1.0.8 urllib3 2.0.7 uvicorn 0.29.0 uvloop 0.19.0 voyageai 0.2.2 wasabi 1.1.2 watchfiles 0.21.0 weasel 0.3.4 websocket-client 0.59.0 websockets 12.0 Werkzeug 3.0.2 wget 3.2 wheel 0.41.2 wrapt 1.16.0 xxhash 3.4.1 yapf 0.40.2 yarg 0.1.9 yarl 1.9.4 zipp 3.15.0

Stangerine commented 4 months ago

can you provide your version of transformers and flash-attn

can you provide your version of transformers and flash-attn

During the process of fine-tuning bge-reranker-v2-minicpm-layerwise, the loss floats around 30. Is this normal? I'm a newbie, can you help me out?

545999961 commented 4 months ago

can you provide your version of transformers and flash-attn

warnings.warn( /root/anaconda3/envs/zzq_kdd/lib/python3.9/site-packages/transformers/tokenization_utils_base.py:2692: UserWarning:max_lengthis ignored whenpadding=Trueand there is no truncation strategy. To pad to max length, usepadding='max_length'. warnings.warn( /root/anaconda3/envs/zzq_kdd/lib/python3.9/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. Friends, I would like to ask whether these two warnings have any impact on training reranker?

No matter, you can continue training.

545999961 commented 4 months ago

can you provide your version of transformers and flash-attn

can you provide your version of transformers and flash-attn

During the process of fine-tuning bge-reranker-v2-minicpm-layerwise, the loss floats around 30. Is this normal? I'm a newbie, can you help me out?

This is normal, because the final loss is the accumulation of the loss from each layer.

Stangerine commented 4 months ago

can you provide your version of transformers and flash-attn

can you provide your version of transformers and flash-attn

During the process of fine-tuning bge-reranker-v2-minicpm-layerwise, the loss floats around 30. Is this normal? I'm a newbie, can you help me out?

This is normal, because the final loss is the accumulation of the loss from each layer.

Thank you very much!