Training on GPU - Githubissues

I had an issue when trying to perform a training run on the GPU, which appeared to be caused by reference and predicted data being stored on different devices leading to errors like RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu).

I can fix this by explicitly allocating the reference data (energies, forces and coords) to the GPU (https://github.com/SimonBoothroyd/descent/blob/92a139604f4b166a6ab040e5e8e8b8a70fa719d8/descent/targets/energy.py#L110):

        energy_ref = entry["energy"].cuda()
        forces_ref = entry["forces"].reshape(len(energy_ref), -1, 3).cuda()

        coords = (
            entry["coords"]
            .reshape(len(energy_ref), -1, 3)
            .detach()
            .requires_grad_(True).cuda()
        )

but likely something smarter is needed that can deal with CPU/GPU runs.

SimonBoothroyd / descent

Training on GPU #76