Open aashishpokharel opened 1 year ago
我试了一下没什么问题:
python -u ernie-layout/run_mrc.py \
--model_name_or_path ernie-layoutx-base-uncased \
--output_dir ./ernie-layout-base-uncased/models/check \
--dataset_name funsd \
--do_train \
--do_eval \
--lang "en" \
--num_train_epochs 6 \
--lr_scheduler_type linear \
--warmup_ratio 0.05 \
--weight_decay 0.05 \
--eval_steps 1000 \
--save_steps 1000 \
--save_total_limit 3 \
--load_best_model_at_end \
--pattern "mrc" \
--use_segment_box false \
--return_entity_level_metrics false \
--overwrite_cache false \
--doc_stride 128 \
--target_size 1000 \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 1 \
--learning_rate 2e-5 \
--preprocessing_num_workers 16 \
--save_total_limit 3 \
--train_nshard 14 \
--seed 1000 \
--metric_for_best_model anls \
--greater_is_better true \
--overwrite_output_dir
我的环境可以参考:
absl-py==2.1.0
aiofiles==23.2.1
aiohttp==3.9.3
aiosignal==1.3.1
aistudio-sdk==0.1.7
altair==5.2.0
annotated-types==0.6.0
antlr4-python3-runtime==4.9.3
anyio==4.3.0
astor==0.8.1
asttokens==2.4.1
async-timeout==4.0.3
attrs==23.2.0
audioread==3.0.1
Babel==2.14.0
backcall==0.2.0
bce-python-sdk==0.9.4
blinker==1.7.0
bokeh==3.1.1
boltons==23.1.1
Bottleneck==1.3.8
braceexpand==0.1.7
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
colorama==0.4.6
coloredlogs==15.0.1
colorlog==6.8.2
contourpy==1.1.1
cycler==0.12.1
Cython==3.0.8
datasets==2.17.1
decorator==5.1.1
dill==0.3.4
Distance==0.1.3
easydict==1.12
editdistance==0.8.1
einops==0.7.0
exceptiongroup==1.2.0
executing==2.0.1
fastapi==0.110.0
ffmpy==0.3.2
filelock==3.13.1
Flask==3.0.2
Flask-Babel==2.0.0
flatbuffers==23.5.26
fonttools==4.49.0
frozenlist==1.4.1
fsspec==2023.10.0
ftfy==6.1.3
future==1.0.0
g2p-en==2.1.0
g2pM==0.1.2.5
gradio==4.19.2
gradio_client==0.10.1
gunicorn==21.2.0
h11==0.14.0
h5py==3.10.0
httpcore==1.0.4
httpx==0.27.0
huggingface-hub==0.21.1
humanfriendly==10.0
HyperPyYAML==1.2.2
idna==3.6
importlib-metadata==7.0.1
importlib_resources==6.1.2
inflect==7.0.0
intervaltree==3.1.0
ipython==8.12.3
itsdangerous==2.1.2
jedi==0.19.1
jieba==0.42.1
Jinja2==3.1.3
joblib==1.3.2
jsonlines==4.0.0
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
kaldiio==2.18.0
kiwisolver==1.4.5
librosa==0.9.2
llvmlite==0.41.1
loguru==0.7.2
lxml==5.1.0
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.7.5
matplotlib-inline==0.1.6
mdurl==0.1.2
mido==1.3.2
mock==5.1.0
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.12.2
nara-wpe==0.0.9
nltk==3.8.1
note-seq==0.0.5
numba==0.58.1
numpy==1.22.0
omegaconf==2.3.0
onnx==1.15.0
onnxruntime==1.17.1
OpenCC==1.1.7
opencc-python-reimplemented==0.1.7
opencv-python==4.6.0.66
opt-einsum==3.3.0
orjson==3.9.15
packaging==23.2
paddle-bfloat==0.1.7
paddle2onnx==1.1.0
paddleaudio==1.1.0
paddlefsl==1.1.0
paddlehub==2.4.0
paddlenlp==2.5.2
paddlepaddle-gpu==2.4.1.post117
paddlesde==0.2.5
paddleslim==2.6.0
paddlespeech==1.4.1
paddlespeech-ctcdecoders==0.2.1
paddlespeech-feat==0.1.0
pandas==2.0.3
parameterized==0.9.0
parso==0.8.3
pathos==0.2.8
pattern_singleton==1.2.0
pexpect==4.9.0
pickleshare==0.7.5
pillow==10.2.0
pkgutil_resolve_name==1.3.10
platformdirs==4.2.0
pooch==1.8.1
portalocker==2.8.2
pox==0.3.4
ppdiffusers==0.19.4
ppft==1.7.6.8
praatio==5.1.1
pretty_midi==0.2.10
prettytable==3.10.0
prompt-toolkit==3.0.43
protobuf==3.20.0
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==15.0.0
pyarrow-hotfix==0.6
pybind11==2.11.1
pycparser==2.21
pycryptodome==3.20.0
pydantic==2.6.3
pydantic_core==2.16.3
pydub==0.25.1
Pygments==2.17.2
pygtrie==2.5.0
pyparsing==3.1.1
pypinyin==0.44.0
pypinyin-dict==0.7.0
python-dateutil==2.8.2
python-multipart==0.0.9
pytz==2024.1
pyworld==0.3.4
PyYAML==6.0.1
pyzmq==25.1.2
rarfile==4.1
referencing==0.33.0
regex==2023.12.25
requests==2.31.0
requests-mock==1.11.0
resampy==0.4.2
rich==13.7.0
rpds-py==0.18.0
ruamel.yaml==0.18.6
ruamel.yaml.clib==0.2.8
ruff==0.2.2
sacrebleu==2.4.0
safetensors==0.4.2
scikit-learn==1.3.2
scipy==1.10.1
semantic-version==2.10.0
sentencepiece==0.2.0
seqeval==1.2.2
shellingham==1.5.4
six==1.16.0
sniffio==1.3.1
sortedcontainers==2.4.0
soundfile==0.12.1
stack-data==0.6.3
starlette==0.36.3
swig==4.2.1
sympy==1.12
tabulate==0.9.0
TextGrid==1.5
threadpoolctl==3.3.0
timer==0.2.2
ToJyutping==0.2.3
tomlkit==0.12.0
tool-helpers==0.1.1
toolz==0.12.1
tornado==6.4
tqdm==4.66.2
traitlets==5.14.1
trampoline==0.1.2
typeguard==2.13.3
typer==0.9.0
typing_extensions==4.10.0
tzdata==2024.1
urllib3==1.26.18
uvicorn==0.27.1
visualdl==2.4.2
wcwidth==0.2.13
webrtcvad==2.0.10
websockets==11.0.3
Werkzeug==3.0.1
xxhash==3.4.1
xyzservices==2023.10.1
yacs==0.1.8
yarl==1.9.4
zhon==2.0.2
zipp==3.17.0
软件环境
重复问题
错误描述
稳定复现步骤 & 代码
The data has been converted from a FUNSD(like) dataset to a DocVQA dataset. The parameters haven't been changed and following is the code to run training. python -u ./model_zoo/ernie-layout/run_mrc.py \ --model_name_or_path ernie-layoutx-base-uncased \ --output_dir ./ernie-layout-base-uncased/models/check \ --dataset_name funsd_to_docvqa \ --do_train \ --do_eval \ --lang "en" \ --num_train_epochs 6 \ --lr_scheduler_type linear \ --warmup_ratio 0.05 \ --weight_decay 0.05 \ --eval_steps 1000 \ --save_steps 1000 \ --save_total_limit 3 \ --load_best_model_at_end \ --pattern "mrc" \ --use_segment_box false \ --return_entity_level_metrics false \ --overwrite_cache false \ --doc_stride 128 \ --target_size 1000 \ --per_device_train_batch_size 8 \ --per_device_eval_batch_size 1 \ --learning_rate 2e-5 \ --preprocessing_num_workers 16 \ --save_total_limit 3 \ --train_nshard 14 \ --seed 1000 \ --metric_for_best_model anls \ --greater_is_better true \ --overwrite_output_dir