Error while training python Train.py --component GF

ikarus1211 commented 7 months ago

Hello, I am experiencing an error when I try to train the GF component. I generate the data using python trainset.py --component GF and then start training python Train.py --component GF. I am using the 3dm_train_rot dataset provided by you. The following is the error message ` File "Train.py", line 19, in generator.run() File "/scratch.ssd/dejvax/job_19076217.meta-pbs.metacentrum.cz/RoReg-master/train/trainer.py", line 134, in run val_results=self.val_evaluator(self.network, self.val_set) File "/scratch.ssd/dejvax/job_19076217.meta-pbs.metacentrum.cz/RoReg-master/train/val.py", line 45, in call outputs=model(data) File "/storage/plzen1/home/dejvax/.conda/envs/roreg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/scratch.ssd/dejvax/job_19076217.meta-pbs.metacentrum.cz/RoReg-master/network/group_feat.py", line 64, in forward yoho_0=self.PartI_net(feats0) File "/storage/plzen1/home/dejvax/.conda/envs/roreg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/scratch.ssd/dejvax/job_19076217.meta-pbs.metacentrum.cz/RoReg-master/network/group_feat.py", line 37, in forward feats_eqv=self.SO3_Conv(feats)# bn,f,gn File "/scratch.ssd/dejvax/job_19076217.meta-pbs.metacentrum.cz/RoReg-master/network/group_feat.py", line 27, in SO3_Conv data=self.data_process(data) File "/scratch.ssd/dejvax/job_19076217.meta-pbs.metacentrum.cz/RoReg-master/network/group_feat.py", line 22, in data_process data=data[:,:,self.Nei_in_SO3] IndexError: too many indices for tensor of dimension 2 | 5998/669600 [27:29<50:42:06, 3.64it/s, loss=0.597, lr=0.0001]

` I thought it might be related to this question: [https://github.com/HpWang-whu/RoReg/issues/1#issuecomment-1476075494] but I tried batch_size 1 and 32 and it still persists Looking forward to your answer

HpWang-whu commented 7 months ago

Hi @ikarus1211 , Thanks for your interest! It is about the validation set. You should set both batch_size and batch_size_val larger than 1, for instance, 32. https://github.com/HpWang-whu/RoReg/blob/766d1a40fde9bc91c03ef3a0ccdf7b6ecce5a404/parses/parses_train_gf.py#L64

Yours,

ikarus1211 commented 6 months ago

So I set both batch_size = 32 and batch_size_val = 32 in https://github.com/HpWang-whu/RoReg/blob/766d1a40fde9bc91c03ef3a0ccdf7b6ecce5a404/parses/parses_train_gf.py#L62-L64 I generated new data with this settings and started the training. Once it got to the validation part, it threw the same error as shown above. I printed the tensors right before the https://github.com/HpWang-whu/RoReg/blob/766d1a40fde9bc91c03ef3a0ccdf7b6ecce5a404/network/group_feat.py#L22. And it seems like it still validates only one 2D tensor and not a batch of 32.

HpWang-whu commented 6 months ago

Hi @ikarus1211 , Quite sorry for that. I have carefully checked the code and found a mistake in my last code reorganization. I have fixed it and you can download the new train/trainer.py to replace the old one. Feel free to contact me if any other error is reported.

Yours,

HpWang-whu / RoReg

Error while training python Train.py --component GF #6