ContextualAI / gritlm

Generative Representational Instruction Tuning
https://arxiv.org/abs/2402.09906
MIT License
479 stars 33 forks source link

RuntimeError #16

Open BlackHandsomeLee opened 3 months ago

BlackHandsomeLee commented 3 months ago

When I run the script of Training Unified model (GRIT)。 got a error: RuntimeError: NVML_SUCCESS == DriverAPI::get()->nvmlDeviceGetHandleByPciBusIdv2( pci_id, &nvml_device) INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":1139, please report a bug to PyTorch.

This error involves operations related to NVML (NVIDIA Management Library) and is likely related to the handling of CUDA and PyTorch

Could you please provide the versions of the various packages you were running at that time?

Muennighoff commented 3 months ago

I've added our torch version here: https://github.com/ContextualAI/gritlm?tab=readme-ov-file#run Let me know if it's still not clear!