cosdt / pytorch-integration-tests

Integration testing of different accelerators with PyTorch
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Reuse PyTorch's test suite #8

Open shink opened 2 months ago

shink commented 2 months ago

Resource

TODO

  1. hardcode of device, add --device arg: https://github.com/cosdt/pytorch-examples/issues/1
shink commented 1 month ago
pip install torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

1. pytorch/examples

上游适配:python 脚本新增 --device 参数,shell 脚本通过环境变量 export BACKEND=npu 指定设备 上游是否修改:是 运行:BACKEND_DEVICE=npu ./run_python_examples.sh 运行所有例子 现状:少许 example 运行失败

2. pytorch/benchmark

上游适配:无 上游是否修改:否 运行:ACCELERATOR=npu python test.py --verbose 运行所有模型的训练测试过程 现状:报错 npu 未注册 -> 测试用例是在子进程中执行,继承了主进程的环境变量,而在主进程中 torch_npu 导入后将环境变量设置为了关闭

错误信息 ``` test_Background_Matting_eval_npu (__main__.TestBenchmark) ... skipped 'Method eval on npu is not implemented because "", skipping...' test_Background_Matting_example_npu (__main__.TestBenchmark) ... FAIL ====================================================================== FAIL: test_Background_Matting_example_npu (__main__.TestBenchmark) ---------------------------------------------------------------------- Traceback (most recent call last): File "/root/benchmark/test.py", line 76, in example_fn assert ( AssertionError: Expected accuracy pass, get eager_two_runs_differ ---------------------------------------------------------------------- Ran 7 tests in 119.484s FAILED (failures=1, skipped=1) ```

3. huggingface/timm

4. huggingface/transformers

运行:

apt install libsndfile1
pip install -r examples/pytorch/_tests_requirements.txt
pytest examples/pytorch/test_pytorch_examples.py -v

from transformers.testing_util import torch_device 自动识别设备

结果:16 failed, 3 passed, 2 skipped ``` =============================================== short test summary info ================================================ FAILED test_pytorch_examples.py::ExamplesTests::test_run_audio_classification - TypeError: 'NoneType' object is not callable FAILED test_pytorch_examples.py::ExamplesTests::test_run_clm - FileNotFoundError: Unable to find '/root/transformers/examples/pytorch/./tests/fixtures/sample_text.txt' FAILED test_pytorch_examples.py::ExamplesTests::test_run_clm_config_overrides - FileNotFoundError: Unable to find '/root/transformers/examples/pytorch/./tests/fixtures/sample_text.txt' FAILED test_pytorch_examples.py::ExamplesTests::test_run_glue - FileNotFoundError: Unable to find '/root/transformers/examples/pytorch/./tests/fixtures/tests_samples/MRPC/train.csv' FAILED test_pytorch_examples.py::ExamplesTests::test_run_image_classification - UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte FAILED test_pytorch_examples.py::ExamplesTests::test_run_instance_segmentation - RuntimeError: Expected all tensors to be on the same device. Expected NPU tensor, please check whether the input te... FAILED test_pytorch_examples.py::ExamplesTests::test_run_mlm - FileNotFoundError: Unable to find '/root/transformers/examples/pytorch/./tests/fixtures/sample_text.txt' FAILED test_pytorch_examples.py::ExamplesTests::test_run_ner - FileNotFoundError: Unable to find '/root/transformers/examples/pytorch/tests/fixtures/tests_samples/conll/sample.json' FAILED test_pytorch_examples.py::ExamplesTests::test_run_semantic_segmentation - AssertionError: 0.0 not greater than or equal to 0.1 FAILED test_pytorch_examples.py::ExamplesTests::test_run_speech_recognition_ctc - AssertionError: nan not less than 73.33543701171875 FAILED test_pytorch_examples.py::ExamplesTests::test_run_speech_recognition_ctc_adapter - AssertionError: nan not less than 75.0298583984375 FAILED test_pytorch_examples.py::ExamplesTests::test_run_speech_recognition_seq2seq - AssertionError: nan not less than 3.872235107421875 FAILED test_pytorch_examples.py::ExamplesTests::test_run_squad - FileNotFoundError: Unable to find '/root/transformers/examples/pytorch/tests/fixtures/tests_samples/SQUAD/sample.json' FAILED test_pytorch_examples.py::ExamplesTests::test_run_squad_seq2seq - FileNotFoundError: Unable to find '/root/transformers/examples/pytorch/tests/fixtures/tests_samples/SQUAD/sample.json' FAILED test_pytorch_examples.py::ExamplesTests::test_run_swag - FileNotFoundError: Unable to find '/root/transformers/examples/pytorch/tests/fixtures/tests_samples/swag/sample.json' FAILED test_pytorch_examples.py::ExamplesTests::test_run_vit_mae_pretraining - UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte ========================== 16 failed, 3 passed, 2 skipped, 318 warnings in 466.30s (0:07:46) =========================== ``` ``` Traceback (most recent call last): File "/root/transformers/examples/pytorch/instance-segmentation/run_instance_segmentation.py", line 480, in main() File "/root/transformers/examples/pytorch/instance-segmentation/run_instance_segmentation.py", line 455, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/usr/local/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2052, in train return inner_training_loop( File "/usr/local/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2388, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 3485, in training_step loss = self.compute_loss(model, inputs) File "/usr/local/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 3532, in compute_loss outputs = model(**inputs) File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/python3.9/lib/python3.9/site-packages/transformers/models/mask2former/modeling_mask2former.py", line 2517, in forward loss_dict = self.get_loss_dict( File "/usr/local/python3.9/lib/python3.9/site-packages/transformers/models/mask2former/modeling_mask2former.py", line 2338, in get_loss_dict loss_dict: Dict[str, Tensor] = self.criterion( File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/python3.9/lib/python3.9/site-packages/transformers/models/mask2former/modeling_mask2former.py", line 768, in forward indices = self.matcher(masks_queries_logits, class_queries_logits, mask_labels, class_labels) File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/python3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/usr/local/python3.9/lib/python3.9/site-packages/transformers/models/mask2former/modeling_mask2former.py", line 475, in forward cost_matrix = torch.minimum(cost_matrix, torch.tensor(1e10)) RuntimeError: Expected all tensors to be on the same device. Expected NPU tensor, please check whether the input tensor device is correct. [ERROR] 2024-10-10-12:59:59 (PID:1154599, Device:0, RankID:-1) ERR01002 OPS invalid type ``` > issue: https://gitee.com/ascend/pytorch/issues/IAWAZ1?from=project-issue
shink commented 1 month ago
ImportError: libGL.so.1: cannot open shared object file: No such file or directory

https://www.cnblogs.com/mrneojeep/p/16252044.html

shink commented 1 month ago
test_models.py::test_model_backward[2-swinv2_cr_large_224] Fatal Python error: Segmentation fault

Thread 0x0000fff99f87f120 (most recent call first):
<no Python frame>

Thread 0x0000fffe99d3f120 (most recent call first):
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 312 in wait
  File "/usr/local/python3.9/lib/python3.9/multiprocessing/queues.py", line 231 in _feed
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 917 in run
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x0000fffe9c54f120 (most recent call first):
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 312 in wait
  File "/usr/local/python3.9/lib/python3.9/multiprocessing/queues.py", line 231 in _feed
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 917 in run
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x0000fffe9ed5f120 (most recent call first):
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 312 in wait
  File "/usr/local/python3.9/lib/python3.9/multiprocessing/queues.py", line 231 in _feed
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 917 in run
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x0000fffea156f120 (most recent call first):
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 312 in wait
  File "/usr/local/python3.9/lib/python3.9/multiprocessing/queues.py", line 231 in _feed
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 917 in run
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x0000fffe9752f120 (most recent call first):
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 312 in wait
  File "/usr/local/python3.9/lib/python3.9/multiprocessing/queues.py", line 231 in _feed
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 917 in run
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x0000fffe94d1f120 (most recent call first):
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 312 in wait
  File "/usr/local/python3.9/lib/python3.9/multiprocessing/queues.py", line 231 in _feed
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 917 in run
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x0000fffe9250f120 (most recent call first):
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 312 in wait
  File "/usr/local/python3.9/lib/python3.9/multiprocessing/queues.py", line 231 in _feed
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 917 in run
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x0000fffe8fcff120 (most recent call first):
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 312 in wait
  File "/usr/local/python3.9/lib/python3.9/multiprocessing/queues.py", line 231 in _feed
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 917 in run
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x0000fffe579bf120 (most recent call first):
  File "/usr/local/python3.9/lib/python3.9/multiprocessing/connection.py", line 379 in _recv
  File "/usr/local/python3.9/lib/python3.9/multiprocessing/connection.py", line 414 in _recv_bytes
  File "/usr/local/python3.9/lib/python3.9/multiprocessing/connection.py", line 250 in recv
  File "/usr/local/python3.9/lib/python3.9/multiprocessing/managers.py", line 810 in _callmethod
  File "<string>", line 2 in get
  File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/utils/multiprocess_util.py", line 91 in run
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x0000fffdf02bf120 (most recent call first):
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 316 in wait
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 581 in wait
  File "/usr/local/python3.9/lib/python3.9/site-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/usr/local/python3.9/lib/python3.9/threading.py", line 937 in _bootstrap

Current thread 0x0000ffffb6ca8640 (most recent call first):
  File "/usr/local/python3.9/lib/python3.9/site-packages/torch/_tensor_str.py", line 146 in __init__
  File "/usr/local/python3.9/lib/python3.9/site-packages/torch/_tensor_str.py", line 357 in _tensor_str
  File "/usr/local/python3.9/lib/python3.9/site-packages/torch/_tensor_str.py", line 625 in _str_intern
  File "/usr/local/python3.9/lib/python3.9/site-packages/torch/_tensor_str.py", line 708 in _str
  File "/usr/local/python3.9/lib/python3.9/site-packages/torch/_tensor.py", line 464 in __repr__
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/_io/saferepr.py", line 73 in repr_instance
  File "/usr/local/python3.9/lib/python3.9/reprlib.py", line 62 in repr1
  File "/usr/local/python3.9/lib/python3.9/reprlib.py", line 52 in repr
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/_io/saferepr.py", line 61 in repr
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/_io/saferepr.py", line 112 in saferepr
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/_code/code.py", line 831 in repr_args
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/_code/code.py", line 927 in repr_traceback_entry
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/_code/code.py", line 982 in <listcomp>
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/_code/code.py", line 981 in repr_traceback
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/_code/code.py", line 1057 in repr_excinfo
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/_code/code.py", line 698 in getrepr
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/nodes.py", line 497 in _repr_failure_py
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/python.py", line 1877 in repr_failure
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/reports.py", line 364 in from_item_and_call
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/runner.py", line 372 in pytest_runtest_makereport
  File "/usr/local/python3.9/lib/python3.9/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/usr/local/python3.9/lib/python3.9/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/python3.9/lib/python3.9/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/runner.py", line 228 in call_and_report
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/runner.py", line 133 in runtestprotocol
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol
  File "/usr/local/python3.9/lib/python3.9/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/usr/local/python3.9/lib/python3.9/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/python3.9/lib/python3.9/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/main.py", line 351 in pytest_runtestloop
  File "/usr/local/python3.9/lib/python3.9/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/usr/local/python3.9/lib/python3.9/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/python3.9/lib/python3.9/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/main.py", line 326 in _main
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/main.py", line 272 in wrap_session
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/main.py", line 319 in pytest_cmdline_main
  File "/usr/local/python3.9/lib/python3.9/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/usr/local/python3.9/lib/python3.9/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/python3.9/lib/python3.9/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/config/__init__.py", line 174 in main
  File "/usr/local/python3.9/lib/python3.9/site-packages/_pytest/config/__init__.py", line 197 in console_main
  File "/usr/local/python3.9/bin/pytest", line 8 in <module>
Segmentation fault (core dumped)