Closed Laurie2905-JOHN closed 5 months ago
I get the same issue. ✋
I cant seem to reproduce this.
This seems like you are loading a gpu-trained model, but loading and transforming it on with --accelerator cpu
.
Can you also provide me with you train script?
https://github.com/DeepWok/mase/compare/main...zniihgnexy:mase_xinyi:device_fix try this. this works in my searching pass
Making the adjustments @zniihgnexy did not change the result. I have tried to run with both --accelerator cpu and gpu, neither work. I am just running the example file as stated in Lab2 task 7.
(mase) laurie2905@LAPTOP-LQSPNHSL:~/mase/machop$ ./ch transform --config configs/examples/jsc_toy_by_type.toml --task cls --accelerator=gpu
Seed set to 0
+-------------------------+--------------------------+--------------+-----------------+--------------------------+
| Name | Default | Config. File | Manual Override | Effective |
+-------------------------+--------------------------+--------------+-----------------+--------------------------+
| task | classification | cls | cls | cls |
| load_name | None | | | None |
| load_type | mz | | | mz |
| batch_size | 128 | 512 | | 512 |
| to_debug | False | | | False |
| log_level | info | | | info |
| report_to | tensorboard | | | tensorboard |
| seed | 0 | 42 | | 42 |
| quant_config | None | | | None |
| training_optimizer | adam | | | adam |
| trainer_precision | 16-mixed | | | 16-mixed |
| learning_rate | 1e-05 | 0.01 | | 0.01 |
| weight_decay | 0 | | | 0 |
| max_epochs | 20 | 5 | | 5 |
| max_steps | -1 | | | -1 |
| accumulate_grad_batches | 1 | | | 1 |
| log_every_n_steps | 50 | 5 | | 5 |
| num_workers | 8 | | | 8 |
| num_devices | 1 | | | 1 |
| num_nodes | 1 | | | 1 |
| accelerator | auto | cpu | gpu | gpu |
| strategy | auto | | | auto |
| is_to_auto_requeue | False | | | False |
| github_ci | False | | | False |
| disable_dataset_cache | False | | | False |
| target | xcu250-figd2104-2L-e | | | xcu250-figd2104-2L-e |
| num_targets | 100 | | | 100 |
| is_pretrained | False | | | False |
| max_token_len | 512 | | | 512 |
| project_dir | /home/laurie2905/mase/ma | | | /home/laurie2905/mase/ma |
| | se_output | | | se_output |
| project | None | jsc-tiny | | jsc-tiny |
| model | None | jsc-tiny | | jsc-tiny |
| dataset | None | jsc | | jsc |
+-------------------------+--------------------------+--------------+-----------------+--------------------------+
INFO Initialising model 'jsc-tiny'...
INFO Initialising dataset 'jsc'...
INFO Project will be created at /home/laurie2905/mase/mase_output/jsc-tiny
INFO Transforming model 'jsc-tiny'...
Traceback (most recent call last):
File "/home/laurie2905/mase/machop/./ch", line 6, in
The only way I can see this happening is:
I have made a fix in fix/transform-force-device-match. Please checkout to that branch and give it a try. I cannot really reproduce this because I dont have the same hardware setup, but let me know whether this works.
Thanks. Aaron
Forcing it to move the model to GPU seems to have fixed it for me. Thanks!
Not sure if this is correct, but I'm just speculating:
I did my training on GPU and it seems like when the state_dict/chkpt was saved, the state will contain information about being on the GPU device. When we loaded it again for the transform
there doesn't seem to be anything to move the input tensors to the correct device before calling modules[node.target](*args, **kwargs)
in graph_iterator_for_metadata
so the model is on GPU while the inputs are on CPU?
Glad to hear that it worked.
I managed to replicate this only when I train with --accelerator cpu
on a gpu-enabled machine...
In transform, the input is fetched through get_dumm_input and this automatically moves the input tensor to gpu
if you have installed pytorch-cuda.
So it feels like to me you have trained the model or saved the checkpoint relates to cpu
. When loading it out the model is on cpu
while the input is on gpu
.
Question: Expected all tensors to be on the same device when running
Commit hash: 6e0304d5d7dd167d6494eac22feff79d39d889f1
Command to reproduce:
./ch transform --config configs/examples/jsc_toy_by_type.toml --task cls --accelerator=cpu
Error message or full log:
Seed set to 0 +-------------------------+--------------------------+--------------+-----------------+--------------------------+ | Name | Default | Config. File | Manual Override | Effective | +-------------------------+--------------------------+--------------+-----------------+--------------------------+ | task | classification | cls | cls | cls | | load_name | None | | | None | | load_type | mz | | | mz | | batch_size | 128 | 512 | | 512 | | to_debug | False | | | False | | log_level | info | | | info | | report_to | tensorboard | | | tensorboard | | seed | 0 | 42 | | 42 | | quant_config | None | | | None | | training_optimizer | adam | | | adam | | trainer_precision | 16-mixed | | | 16-mixed | | learning_rate | 1e-05 | 0.01 | | 0.01 | | weight_decay | 0 | | | 0 | | max_epochs | 20 | 5 | | 5 | | max_steps | -1 | | | -1 | | accumulate_grad_batches | 1 | | | 1 | | log_every_n_steps | 50 | 5 | | 5 | | num_workers | 8 | | | 8 | | num_devices | 1 | | | 1 | | num_nodes | 1 | | | 1 | | accelerator | auto | cpu | cpu | cpu | | strategy | auto | | | auto | | is_to_auto_requeue | False | | | False | | github_ci | False | | | False | | disable_dataset_cache | False | | | False | | target | xcu250-figd2104-2L-e | | | xcu250-figd2104-2L-e | | num_targets | 100 | | | 100 | | is_pretrained | False | | | False | | max_token_len | 512 | | | 512 | | project_dir | /home/laurie2905/mase/ma | | | /home/laurie2905/mase/ma | | | se_output | | | se_output | | project | None | jsc-tiny | | jsc-tiny | | model | None | jsc-tiny | | jsc-tiny | | dataset | None | jsc | | jsc | +-------------------------+--------------------------+--------------+-----------------+--------------------------+ INFO Initialising model 'jsc-tiny'... INFO Initialising dataset 'jsc'... INFO Project will be created at /home/laurie2905/mase/mase_output/jsc-tiny INFO Transforming model 'jsc-tiny'... Traceback (most recent call last): File "/home/laurie2905/mase/machop/./ch", line 6, in
ChopCLI().run()
File "/home/laurie2905/mase/machop/chop/cli.py", line 243, in run
self._run_transform()
File "/home/laurie2905/mase/machop/chop/cli.py", line 350, in _run_transform
transform(transformparams)
File "/home/laurie2905/mase/machop/chop/actions/transform.py", line 74, in transform
graph, = add_common_metadata_analysis_pass(
File "/home/laurie2905/mase/machop/chop/passes/graph/analysis/add_metadata/add_common_metadata.py", line 382, in add_common_metadata_analysis_pass
graph = graph_iterator_for_metadata(graph, pass_args)
File "/home/laurie2905/mase/machop/chop/passes/graph/analysis/add_metadata/add_common_metadata.py", line 203, in graph_iterator_for_metadata
result = modules[node.target](*args, kwargs)
File "/home/laurie2905/anaconda3/envs/mase/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/home/laurie2905/anaconda3/envs/mase/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, kwargs)
File "/home/laurie2905/anaconda3/envs/mase/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py", line 171, in forward
return F.batch_norm(
File "/home/laurie2905/anaconda3/envs/mase/lib/python3.10/site-packages/torch/nn/functional.py", line 2478, in batch_norm
return torch.batch_norm(
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_batch_norm)