ACEsuit / mace

MACE - Fast and accurate machine learning interatomic potentials with higher order equivariant message passing.
Other
550 stars 205 forks source link

Issue: NotImplementedError with aten::empty_strided on CPU-Only Machine #656

Closed GUANGZChen closed 3 weeks ago

GUANGZChen commented 3 weeks ago

Description I’m encountering an issue when using mace_mp and MACECalculator from the MACE library on a CPU-only machine. Running the model in a CPU-only environment raises a NotImplementedError related to the aten::empty_strided operation. It appears that the code is attempting to execute a CUDA-specific operation on a CPU, despite specifying device='cpu'.

python

from mace.calculators import mace_mp, MACECalculator from pathlib import Path

Define the model path

model = Path("./potential/MACE_model_swa.model").expanduser()

Initialize mace_mp with CPU-only setup

calculator = mace_mp(model=model, device="cpu")

Alternatively, using MACECalculator

calculator = MACECalculator(model_paths=['./potential/MACE_model_swa.model'], device='cpu') Steps Taken to Resolve Installed the CPU-only version of PyTorch (pip install torch --index-url https://download.pytorch.org/whl/cpu). Verified that device='cpu' was explicitly set in both mace_mp and MACECalculator. Removed all CUDA-related environment variables to prevent CUDA library loading (unset CUDA_HOME, unset CUDA_PATH, unset LD_LIBRARY_PATH if it included CUDA paths). Tested basic CPU-only tensor operations in PyTorch, which work correctly outside of MACE. Expected Behavior With device='cpu' set, mace_mp and MACECalculator should avoid all CUDA dependencies and run exclusively on the CPU.

Actual Behavior

NotImplementedError: Could not run 'aten::empty_strided' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty_strided' is only available for these backends: [CPU, Meta, QuantizedCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

Environment Python version: 3.9 PyTorch version: CPU-only installation

Additional Context

I believe this issue may be due to a hardcoded CUDA dependency within mace_mp or MACECalculator. Is there a way to enforce CPU-only execution, or could there be a potential fix to handle CPU-only environments more gracefully?

Thank you for your assistance!

bernstei commented 3 weeks ago

I use cpu-only pytorch all the time, so the problem is not inherent to mace. Could it be that you saved a cuda model instead of the cpu version (mace_run_train --save_cpu, I believe)?

GUANGZChen commented 3 weeks ago

Thanks. I think it might be the problem since I could load mace_mp on my device but not the trained mace potential.

bernstei commented 3 weeks ago

With a tiny bit of torch you can, but only on a GPU machine, load the model and save it as a cpu model. There may be some script in mace that can do that already, but if not, I think it's be a nearly trivial but useful addition.

GUANGZChen commented 3 weeks ago

With a tiny bit of torch you can, but only on a GPU machine, load the model and save it as a cpu model. There may be some script in mace that can do that already, but if not, I think it's be a nearly trivial but useful addition.

Hi, Bernstei, Thank you for your reply. Could you please let me know which code I should use to change the output format for the model?

ilyes319 commented 3 weeks ago

Hey @GUANGZChen, You need to use --save_cpu in your input script.

bernstei commented 3 weeks ago

@ilyes - would you be interested in a PR that adds a gpu_to_cpu script? So people don't have to get into the torch code, or rerun their fit (even if it's fast, because it's from a checkpoint)?

ilyes319 commented 3 weeks ago

Yes that would be useful as we keep copying it to people. I really need to change the default in main, that's top of my list.

bernstei commented 3 weeks ago

Do you want it branched from develop or main?

bernstei commented 3 weeks ago

@ilyes319 The patch is ready - it's nearly trivial, maybe 10-15 lines of code, including all the argument parsing overhead. I just need to know what branch to create the PR relative to

ilyes319 commented 3 weeks ago

Thanks! The develop branch please

bernstei commented 3 weeks ago

Oops - just pushed into develop by mistake. Do you want me to revert and do a proper PR?

ilyes319 commented 3 weeks ago

I think it is fine as it is rather standalone, thank you!

bernstei commented 3 weeks ago

OK, I'll leave it alone. If it's useful to do floating precision conversions that could be easy to add, but I think most of the code does its own conversion if needed now.

ilyes319 commented 3 weeks ago

@GUANGZChen Please add --save_cpu to your input file for now. I will close that.

ignaciomigliaro commented 2 weeks ago

I am currently encountering the same problems and do not wish to retrain. I tried making a python code to move to cpu. But I am having no success. I am running the script on CUDA enabled machine but it's not fixing the issue. If you could guide me on what I could do to save these models it would be great. Thanks!

import torch

def load_and_convert_model(model_path, output_path):
        cpu_data = torch.load(model_path, map_location=torch.device("cpu"))
        torch.save(cpu_data, output_path)
        print(f"Model successfully converted and saved to {output_path}")

if __name__ == "__main__":
    import argparse
    parser = argparse.ArgumentParser(description="Convert a PyTorch model saved on GPU to CPU")
    parser.add_argument("input_model_path", type=str, help="Path to the GPU model file")
    parser.add_argument("output_model_path", type=str, help="Path to save the converted CPU model file")

    args = parser.parse_args()
    load_and_convert_model(args.input_model_path, args.output_model_path)
ilyes319 commented 2 weeks ago

If you are on a CUDA machine you need to do

def load_and_convert_model(model_path, output_path):
        cuda_model= torch.load(model_path, map_location=torch.device("cuda"))
        cpu_model = cuda_model.cpu()
        torch.save(cpu_model, output_path)
        print(f"Model successfully converted and saved to {output_path}")

If you know that you are going to use your model on cpu, please you the --save_cpu flag, while training.