Closed animikhaich closed 1 year ago
Hmmm... I haven't seen this before. I'm about 80% sure this is not really a dimensionality issue, it's a CUDA version mismatch, where for some reason the system CUDA environment is being used instead of the Python environment's CUDA environment. A couple things to try, starting with the easiest:
In the shell from which you are launching EcoAssist, try running:
export LD_LIBRARY_PATH=''
...prior to starting EcoAssist. I'm 61% sure this will fix the problem, and if that's the case, we have an easy fix, and I get to grumble about how I wish CUDA installs wouldn't mess with LD_LIBRARY_PATH.
It would help debug a little if we could take EcoAssist out of the loop just to remove a level of indirection, so if the person who owns the environment is up for it, it would be great to go through the MegaDetector setup instructions. If we can repro the issue there, we'll have a simpler time debugging.
I don't really recommend that the environment owner do this, but FWIW, I think uninstalling CUDA entirely from the system will fix the issue. In principle I'd like to do this as a debugging step, but it's a big hammer to wield if the user is using the system CUDA for other things.
I don't think we'll go past (2) just yet, but if (1) doesn't work, and we can repro the problem in a standalone Python environment (i.e., outside of EcoAssist), we can try to upgrade PyTorch in that environment to match the system CUDA version. If that works, we've at least verified that it was really a CUDA version mismatch, then we can decide what to do about it.
@agentmorris Thanks for your response!
@animikhaich With regards to option 1, the easiest way to run export LD_LIBRARY_PATH=''
prior to opening EcoAssist would be to add this line somewhere before the python command on line 109 in /home/ani/.EcoAssist_files/EcoAssist/open.command
.
Thanks @agentmorris and @PetervanLunteren. Option 1 resolved it!
After a standard installation, I tried to run the test steps as outlined here.
I encountered this error:
As suggested by issues #9 and #6 I verified the existence of
md_v5a.0.0.pt
andmd_v5b.0.0.pt
in.EcoAssist_files/pretrained_models
.The
stdout.txt
log dump is given below:My System Information:
Nvidia Driver:
Default CUDA Version: