Open thomaschhh opened 1 month ago
When running the test suite there are a few things that stand out:
Traceback (most recent call last):
File "/home/operation/modalities_old/tests/tests.py", line 136, in <module>
main(**args)
File "/home/operation/modalities_old/tests/tests.py", line 113, in main
assert isfile(
AssertionError: ERROR! /home/operation/modalities_old/examples/getting_started/run_getting_started_example.sh does not exist.
PR #265 fixes this issue.
However, the _getting_startedexample is not working
=== RUN GETTING STARTED EXAMPLE === cd /home/operation/modalities_old/tutorials/getting_started; bash run_getting_started_example.sh 0 1 run getting_started_examples on CUDA_VISIBLE_DEVICES=0,1 /home/operation/miniconda3/envs/modalities2/lib/python3.10/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_checkpointpath" has conflict with protected namespace "model".
[...]
rank1: Traceback (most recent call last): rank1: File "/home/operation/miniconda3/envs/modalities2/bin/modalities", line 8, in
rank1: File "/home/operation/miniconda3/envs/modalities2/lib/python3.10/site-packages/click/core.py", line 1157, in call rank1: return self.main(args, kwargs) rank1: File "/home/operation/miniconda3/envs/modalities2/lib/python3.10/site-packages/click/core.py", line 1078, in main rank1: rv = self.invoke(ctx) rank1: File "/home/operation/miniconda3/envs/modalities2/lib/python3.10/site-packages/click/core.py", line 1688, in invoke rank1: return _process_result(sub_ctx.command.invoke(sub_ctx)) rank1: File "/home/operation/miniconda3/envs/modalities2/lib/python3.10/site-packages/click/core.py", line 1434, in invoke rank1: return ctx.invoke(self.callback, ctx.params) rank1: File "/home/operation/miniconda3/envs/modalities2/lib/python3.10/site-packages/click/core.py", line 783, in invoke rank1: return __callback(args, **kwargs) rank1: File "/home/operation/modalities_old/src/modalities/main.py", line 60, in entry_point_run_modalities rank1: with CudaEnv(process_group_backend=ProcessGroupBackendType.nccl): rank1: File "/home/operation/modalities_old/src/modalities/running_env/cuda_env.py", line 33, in enter
rank1: File "/home/operation/miniconda3/envs/modalities2/lib/python3.10/site-packages/torch/cuda/init.py", line 420, in set_device
rank1: RuntimeError: CUDA error: invalid device ordinal rank1: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. rank1: For debugging consider passing CUDA_LAUNCH_BLOCKING=1 rank1: Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.Traceback (most recent call last): File "/home/operation/modalities_old/tests/tests.py", line 136, in
main(**args) File "/home/operation/modalities_old/tests/tests.py", line 123, in main check_existence_and_clear_getting_started_example_output(run_getting_started_example_directory, date_of_run) File "/home/operation/modalities_old/tests/tests.py", line 43, in check_existence_and_clear_getting_started_example_output assert checkpoint_to_delete is not None, f"ERROR! could not find a checkpoint with datetime > {date_of_run}" AssertionError: ERROR! could not find a checkpoint with datetime > 2024-11-04__11-37-23
@thomaschhh I wasn't able to reproduce your error. All tests including the getting started example seem to run through using python tests/tests.py --multi-gpu
. Could you try again with the latest commit (I merged main
into type-annotations
)? If the problem persists, feel free to open a new issue as the problem seems to be unrelated to this PR.
What does this PR do?
This PR uses built-in types instead of using the ones from the typing module. It also implements the adherence to PEP 604.
General Changes
Breaking Changes
Checklist before submitting final PR
python tests/tests.py
)CHANGELOG_DEV.md
)