Modalities / modalities

Modalities, a PyTorch-native framework for distributed and reproducible foundation model training.
MIT License
63 stars 8 forks source link

chore: use built-in types #262

Open thomaschhh opened 1 month ago

thomaschhh commented 1 month ago

What does this PR do?

This PR uses built-in types instead of using the ones from the typing module. It also implements the adherence to PEP 604.

General Changes

Breaking Changes

Checklist before submitting final PR

thomaschhh commented 1 week ago

When running the test suite there are a few things that stand out:

Traceback (most recent call last):
  File "/home/operation/modalities_old/tests/tests.py", line 136, in <module>
    main(**args)
  File "/home/operation/modalities_old/tests/tests.py", line 113, in main
    assert isfile(
AssertionError: ERROR! /home/operation/modalities_old/examples/getting_started/run_getting_started_example.sh does not exist.

PR #265 fixes this issue.

However, the _getting_startedexample is not working

=== RUN GETTING STARTED EXAMPLE === cd /home/operation/modalities_old/tutorials/getting_started; bash run_getting_started_example.sh 0 1 run getting_started_examples on CUDA_VISIBLE_DEVICES=0,1 /home/operation/miniconda3/envs/modalities2/lib/python3.10/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_checkpointpath" has conflict with protected namespace "model".

[...]

rank1: Traceback (most recent call last): rank1: File "/home/operation/miniconda3/envs/modalities2/bin/modalities", line 8, in

rank1: File "/home/operation/miniconda3/envs/modalities2/lib/python3.10/site-packages/click/core.py", line 1157, in call rank1: return self.main(args, kwargs) rank1: File "/home/operation/miniconda3/envs/modalities2/lib/python3.10/site-packages/click/core.py", line 1078, in main rank1: rv = self.invoke(ctx) rank1: File "/home/operation/miniconda3/envs/modalities2/lib/python3.10/site-packages/click/core.py", line 1688, in invoke rank1: return _process_result(sub_ctx.command.invoke(sub_ctx)) rank1: File "/home/operation/miniconda3/envs/modalities2/lib/python3.10/site-packages/click/core.py", line 1434, in invoke rank1: return ctx.invoke(self.callback, ctx.params) rank1: File "/home/operation/miniconda3/envs/modalities2/lib/python3.10/site-packages/click/core.py", line 783, in invoke rank1: return __callback(args, **kwargs) rank1: File "/home/operation/modalities_old/src/modalities/main.py", line 60, in entry_point_run_modalities rank1: with CudaEnv(process_group_backend=ProcessGroupBackendType.nccl): rank1: File "/home/operation/modalities_old/src/modalities/running_env/cuda_env.py", line 33, in enter

rank1: File "/home/operation/miniconda3/envs/modalities2/lib/python3.10/site-packages/torch/cuda/init.py", line 420, in set_device

rank1: RuntimeError: CUDA error: invalid device ordinal rank1: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. rank1: For debugging consider passing CUDA_LAUNCH_BLOCKING=1 rank1: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Traceback (most recent call last): File "/home/operation/modalities_old/tests/tests.py", line 136, in main(**args) File "/home/operation/modalities_old/tests/tests.py", line 123, in main check_existence_and_clear_getting_started_example_output(run_getting_started_example_directory, date_of_run) File "/home/operation/modalities_old/tests/tests.py", line 43, in check_existence_and_clear_getting_started_example_output assert checkpoint_to_delete is not None, f"ERROR! could not find a checkpoint with datetime > {date_of_run}" AssertionError: ERROR! could not find a checkpoint with datetime > 2024-11-04__11-37-23

flxst commented 1 week ago

@thomaschhh I wasn't able to reproduce your error. All tests including the getting started example seem to run through using python tests/tests.py --multi-gpu. Could you try again with the latest commit (I merged main into type-annotations)? If the problem persists, feel free to open a new issue as the problem seems to be unrelated to this PR.