Repeat that there is a problem with your work

clclclaiggg commented 11 months ago

Hello, I have this problem: The size of tensor a (128) must match the size of tensor b (0) at non-singleton dimension 1, how do I solve it

cvignac commented 11 months ago

Hello, are you using the latest version of the code, with the packages specified by the new requirements.txt? Which branch are you using? Thanks

clclclaiggg commented 11 months ago

Hi, I'm using the latest version of the code, but I'm running it on windows. If I run main.py directly, do I need to add any instructions? The error code is as follows. thank you

`Found rdkit, all good Dataset smiles were found. E:\anaconda\envs\digress\lib\site-packages\torch\nn\init.py:405: UserWarning: Initializing zero-element tensors is a no-op warnings.warn("Initializing zero-element tensors is a no-op") Marginal distribution of the classes: tensor([0.7230, 0.1151, 0.1593, 0.0026]) for nodes, tensor([0.7261, 0.2384, 0.0274, 0.0081, 0.0000]) for edges GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1 [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [clclcl]:54216 (system error: 10049 - ��У��ĵ�ַ��Ч��). [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [clclcl]:54216 (system error: 10049 - ��У��ĵ�ַ��Ч��). [2023-07-14 15:41:57,862][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 0 [2023-07-14 15:41:57,862][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.

distributed_backend=nccl All distributed processes registered. Starting with 1 processes

You are using a CUDA device ('NVIDIA GeForce RTX 4060 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('medium' | 'high') which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Error executing job with overrides: [] Traceback (most recent call last): File "E:\DiGress-main\src\main.py", line 202, in main trainer.fit(model, datamodule=datamodule, ckpt_path=cfg.general.resume) File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 531, in fit call._call_and_handle_interrupt( File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\trainer\call.py", line 41, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, *kwargs) File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 91, in launch return function(args, kwargs) File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 570, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 951, in _run self.strategy.setup(self) File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 164, in setup self.configure_ddp() File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 269, in configure_ddp self.model = self._setup_model(_LightningModuleWrapperBase(self.model)) File "E:\anaconda\envs\digress\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 183, in _setup_model return DistributedDataParallel(module=model, device_ids=device_ids, self._ddp_kwargs) File "E:\anaconda\envs\digress\lib\site-packages\torch\nn\parallel\distributed.py", line 657, in init _sync_module_states( File "E:\anaconda\envs\digress\lib\site-packages\torch\distributed\utils.py", line 136, in _sync_module_states _sync_params_and_buffers( File "E:\anaconda\envs\digress\lib\site-packages\torch\distributed\utils.py", line 154, in _sync_params_and_buffers dist._broadcast_coalesced( RuntimeError: The size of tensor a (128) must match the size of tensor b (0) at non-singleton dimension 1

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

`

cvignac / DiGress

Repeat that there is a problem with your work #56

distributed_backend=nccl All distributed processes registered. Starting with 1 processes