dushko-murtovski commented 11 months ago

Hi,

I got an error installing it that said it can't find module torch. After installing torch with 'pip install torch' i still have this:

Getting requirements to build wheel ... error error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> [21 lines of output]

:5: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html Traceback (most recent call last): File "C:\Users\dushk\AppData\Local\Programs\Python\Python311\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in main() File "C:\Users\dushk\AppData\Local\Programs\Python\Python311\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main json_out['return_val'] = hook(**hook_input['kwargs']) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\dushk\AppData\Local\Programs\Python\Python311\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel return hook(config_settings) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\dushk\AppData\Local\Temp\pip-build-env-kwspp0n0\overlay\Lib\site-packages\setuptools\build_meta.py", line 355, in get_requires_for_build_wheel return self._get_build_requires(config_settings, requirements=['wheel']) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\dushk\AppData\Local\Temp\pip-build-env-kwspp0n0\overlay\Lib\site-packages\setuptools\build_meta.py", line 325, in _get_build_requires self.run_setup() File "C:\Users\dushk\AppData\Local\Temp\pip-build-env-kwspp0n0\overlay\Lib\site-packages\setuptools\build_meta.py", line 507, in run_setup super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script) File "C:\Users\dushk\AppData\Local\Temp\pip-build-env-kwspp0n0\overlay\Lib\site-packages\setuptools\build_meta.py", line 341, in run_setup exec(code, locals()) File "", line 9, in ModuleNotFoundError: No module named 'torch' [end of output] Do i need some special version of torch. I have RTX 3060 and CUDA installed. Python version 3.11.5. Regards.

chenhsuanlin commented 11 months ago

Hi @dushko-murtovski, please follow the README to use either the prebuilt docker images or conda environments. We also have not developed full support for Windows.

dushko-murtovski commented 11 months ago

Hi,

Thanks, i am testing on Windows using Conda. The problem was that the Conda environment was not initializing correctly on my side and i had higher Cuda version then supported by pytorch. Cuda version above 11.8 is not working due to the fact that pytorch support 11.8.

After installing it manually i am getting this error:

$ torchrun --nproc_per_node=1 train.py --logdir=logs/dmtest/dtu24 --config="E:/Source/Private/neuralangelo-main/projects/neuralangelo/configs/custom/dtu24.yaml" --show_pbar NOTE: Redirects are currently not supported in Windows or MacOs. [W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - The requested address is not valid in its context.). [W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - The requested address is not valid in its context.). Traceback (most recent call last): File "train.py", line 104, in main() File "train.py", line 46, in main set_affinity(args.local_rank) File "E:\Source\Private\neuralangelo-main\imaginaire\utils\gpu_affinity.py", line 75, in set_affinity os.sched_setaffinity(0, dev.get_cpu_affinity()) AttributeError: module 'os' has no attribute 'sched_setaffinity' ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 4484) of binary: D:\anaconda3\envs\NA\python.exe Traceback (most recent call last): File "D:\anaconda3\envs\NA\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "D:\anaconda3\envs\NA\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "D:\anaconda3\envs\NA\Scripts\torchrun.exemain.py", line 7, in File "D:\anaconda3\envs\NA\lib\site-packages\torch\distributed\elastic\multiprocessing\errorsinit__.py", line 346, in wrapper return f(*args, **kwargs) File "D:\anaconda3\envs\NA\lib\site-packages\torch\distributed\run.py", line 794, in main run(args) File "D:\anaconda3\envs\NA\lib\site-packages\torch\distributed\run.py", line 785, in run elastic_launch( File "D:\anaconda3\envs\NA\lib\site-packages\torch\distributed\launcher\api.py", line 134, in call__ return launch_agent(self._config, self._entrypoint, list(args)) File "D:\anaconda3\envs\NA\lib\site-packages\torch\distributed\launcher\api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-09-25_09:46:52 host : DESKTOP-CJO93J8 rank : 0 (local_rank: 0) exitcode : 1 (pid: 4484) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ Is this due to the fact that i am testing this on Window? I am testing this in Conda environment. I would really appreciate the help. Regards

NVlabs / neuralangelo

Error installing #121

train.py FAILED