enyen / KOVIS_VisualServo

Code for keypoint-based visual servo IROS2020
GNU General Public License v3.0
37 stars 4 forks source link

Could not multiply by non-number #2

Open HRItdy opened 2 months ago

HRItdy commented 2 months ago

Hi Enyen,

Thanks for sharing the code! I have trained a kovis1 model and use the command rob.servo('pick_tube', 10, [0.01, 0.01, 0.01, 0.05, 0, 0], [0.1, 5]) to test it. I printed the velocity and orientaion:

Velocity: [tensor(0.0022), tensor(0.0024), tensor(0.0024), tensor(-3.7409e-06), 0, 0]
Orientation: <Orientation: RV(-1.086, -0.000, -0.000)>

Are they reasonable? Then an error raised:

Traceback (most recent call last):
  File "/home/lab_cheem/miniforge3/envs/kovis1-2.7/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/lab_cheem/miniforge3/envs/kovis1-2.7/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lab_cheem/catkin_ws/src/WP2/kovis1/inference_oja.py", line 204, in _watch_servo
    self.rob.speedl_tool(vec, ACCE, t)
  File "/home/lab_cheem/.local/lib/python3.8/site-packages/urx/robot.py", line 176, in speedl_tool
    v = pose.orient * m3d.Vector(velocities[:3])
  File "/home/lab_cheem/.local/lib/python3.8/site-packages/math3d/vector.py", line 455, in __rmul__
    raise utils.Error('__rmul__ : Could not multiply by non-number')
math3d.utils.Error: __rmul__ : Could not multiply by non-number

This seems to be an internal issue in python urx. I searched this but found no solution. Did you encounter this problem before? Any suggestion is appreciated. Thanks!

Regards, Daiying

enyen commented 2 months ago

indeed it seems to be an issue with urx and math3d. Maybe you could try to revert math3d to an earlier version https://gitlab.com/morlin/pymath3d/-/tags

HRItdy commented 2 months ago

Thanks for your reply.

I tried all the other versions of math3d and kept getting this issue:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lab_cheem/catkin_ws/src/WP2/kovis1/inference_oja.py", line 161, in servo
    self.move_tcp_perpendicular()
  File "/home/lab_cheem/catkin_ws/src/WP2/kovis1/inference_oja.py", line 95, in move_tcp_perpendicular
    tcp_pose = self.rob.getl()
  File "/home/lab_cheem/.local/lib/python3.8/site-packages/urx/robot.py", line 220, in getl
    return t.pose_vector.get_array().tolist()
AttributeError: 'numpy.ndarray' object has no attribute 'get_array'

It seems to be the issue of the numpy? I revert the numpy version but still get this. Do you remember the configuration you used before? Thanks!

HRItdy commented 2 months ago

Never mind Enyen, I found the original urx code is different with this. return t.pose_vector.tolist()

I will reinstall the urx. Thanks!

HRItdy commented 2 months ago

Hi Enyen,

For now when I run the servo script, the robot will automatically transform to the position as shown below: eff2cd4aa0091a48183af17f0d08801

I changed the position of the object in pybullet but the gripper still aims at the ground. Regarding in the final demo the gripper should to be horizontal, should I change the urdf to make the gripper be horizontal? Or there is some other easier way to achieve this? And if we need to update the position of the robot in the script, or it will be automatically updated according to the training dataset? Thanks!

enyen commented 2 months ago

https://github.com/enyen/KOVIS_VisualServo/blob/5ea1d9abe1a6c9c99fb3c2046e16251d9e57b10f/inference_oja.py#L158 The code move_tcp_perpendicular everytime before running the servo, so you could modify this line according to your application.

PrithvirajRay commented 2 months ago

Hi Enyen, thanks for the response, however this is at the inference stage right? Wouldnt we want to train the robot in that horizontal end position from the training data generation phase itself? Basically we want the robot to grab the tube horizontally so in the simulator during training data generation also we need the robot to move that similar way right?

HRItdy commented 2 months ago

Hi Enyen,

I generated 7000 records to train the model, and the result seems not good enough. In some test image, the distribution of the keypoints should be good (but several kps fall into the background): 0046_0 49_0 69

But in some other images, the distribution is not good, and the object is not completely segmented out: 0041_0 50_0 69 0040_0 00_0 69

And this is the final loss. I guess it's not good enough: Screenshot from 2024-05-08 20-39-25

Is there any suggestion to improve the performance of the model? Is this because the object is too small? Thanks!

HRItdy commented 2 months ago

Hi Enyen,

I have sent out the meeting link through gmail. Pls let me know if you didn't receive this.

By the way, is there by any change you encountered no realsense device can be detected issue? It seems due to the driver. Is it okay if we directly reinstall the driver? Will this destroy some features you made before? Thanks!

enyen commented 1 month ago

Hi Enyen, thanks for the response, however this is at the inference stage right? Wouldnt we want to train the robot in that horizontal end position from the training data generation phase itself? Basically we want the robot to grab the tube horizontally so in the simulator during training data generation also we need the robot to move that similar way right?

For data generation, you have to design your task in data cfg yaml according to your application. For inference, place the gripper accordingly ie. horizontally in your case.

Hi Enyen,

I generated 7000 records to train the model, and the result seems not good enough. In some test image, the distribution of the keypoints should be good (but several kps fall into the background): 0046_0 49_0 69

But in some other images, the distribution is not good, and the object is not completely segmented out: 0041_0 50_0 69 0040_0 00_0 69

And this is the final loss. I guess it's not good enough: Screenshot from 2024-05-08 20-39-25

Is there any suggestion to improve the performance of the model? Is this because the object is too small? Thanks!

If the training data is generated correctly, try to tune the hyperparams in the training cfg yaml. Maybe using a plain background for debugging.

Hi Enyen,

I have sent out the meeting link through gmail. Pls let me know if you didn't receive this.

By the way, is there by any change you encountered no realsense device can be detected issue? It seems due to the driver. Is it okay if we directly reinstall the driver? Will this destroy some features you made before? Thanks!

Mostly connection issues for me. But reinstalling the driver should be fine.

HRItdy commented 1 month ago

Hi Enyen,

Really thanks for your help! I move the object closer to the arm. But after doing this, the base is included into the view. Does this matter? image

And in this setting the arm also seems to be a little extreme. Should I move it away further? And this may cause the object to fall on the edge of the base, does this matter? Really appreciate your help!

Regards, Daiying

HRItdy commented 1 month ago

Move away further will result this:

MicrosoftTeams-image (36)

The object will fall on the edge of the base. Does this matter? Thanks!

enyen commented 1 month ago

use cam_clip to limit camera range and avoid seeing robot base https://github.com/enyen/KOVIS_VisualServo/blob/5ea1d9abe1a6c9c99fb3c2046e16251d9e57b10f/cfg/dataset_insert_plug.yaml#L34

HRItdy commented 1 month ago

Thanks for your reply Enyen! Based on your suggestion, I tried to paste the image on a plain background: Screenshot from 2024-05-11 20-10-38

This is the saved training image: train

And this is the loss, seems still very high, I will adjust the hyperparameters to reduce it Screenshot from 2024-05-11 20-36-51

Is there any obvious issue in this procedure now? In addition, now there is random noise on the image. Should I also disable the noise for now? 0005_0 00_0 69

Thanks!

HRItdy commented 1 month ago

Hi Enyen,

I reduced the number of keypoints to 6 and adjust concentrate to -30 and inside to 60. Now there are 2 kps falling into the object and move with it: 0330_0 09_0 66 0331_0 00_0 66 0332_0 02_0 70 0333_0 03_0 68 0343_0 09_0 68 0344_0 15_0 68 0345_0 10_0 67 0346_0 00_0 67 But the loss still seems not good enough: Screenshot from 2024-05-12 16-24-42

I would like to ask what's the reasonable range of the loss? Should I add the background back now and retrain the model, or continue to adjust the parameters to reduce the loss? Thanks!

enyen commented 1 month ago

The speed loss does not look good. Is the training data generated correctly? You can check by verifying the poses in first step of all episodes are correct.

Yes, you should continue to try different settings and also on robot.

HRItdy commented 1 month ago

Thanks for your reply Enyen!

I sampled several initial steps: 00001_00 00002_00 00000_00

They have the similar relative positions (but has some random orientation). Is this correct?

And I noticed that in one episode, seems like the camera has some translation in different steps:

00002_00 00002_01 00002_02 00002_03 00002_04

Should the camera be fixed in one episode? Is this the reason why the speed loss is not good enough?

Thanks!

enyen commented 1 month ago

yes, camera pose is randomized in every images. You can turn it off at https://github.com/enyen/KOVIS_VisualServo/blob/5ea1d9abe1a6c9c99fb3c2046e16251d9e57b10f/cfg/dataset_insert_plug.yaml#L33

HRItdy commented 1 month ago

Thanks Enyen!

I increased the dataset size and adjusted the parameters, the result is shown as follows: Screenshot from 2024-05-15 12-16-50

Is this loss good enough? Should the ideal speed loss be smaller than 0.1? Thanks!

HRItdy commented 1 month ago

Hi Enyen, please forget the problem. I tested this model on the real robot, and the performance is good (although bad performance from some specific initial positions, I will continue to adjust the hyperparameter to improve this). Really thanks for your help!

HRItdy commented 2 weeks ago

Hi Enyen,

Sorry for the bother, I'm trying to understand the code, and visualize the keypoints in the inference phase. I found there seems to be some inconsistency between the inference and training phases.

In the training, the inference is set to be False https://github.com/enyen/KOVIS_VisualServo/blob/5ea1d9abe1a6c9c99fb3c2046e16251d9e57b10f/train_model.py#L8 So the encoder will take one single image as input: https://github.com/enyen/KOVIS_VisualServo/blob/5ea1d9abe1a6c9c99fb3c2046e16251d9e57b10f/train_servo.py#L77

But in the inference phase, parameter inference is set to be True, so it will require images from left and right camera simultaneously: https://github.com/enyen/KOVIS_VisualServo/blob/5ea1d9abe1a6c9c99fb3c2046e16251d9e57b10f/inference_oja.py#L132

And when I change it to False, the output keypoints have one dimension all 0: image If I keep the inference as True and input the right and left images simultaneously like:

kp = getattr(self, 'kp_{}'.format(action))([infraL.cuda(), infraR.cuda()])

the output keypoints only have one dimension: image

I would like to ask whether inference should be True. If so how to use this one dimension vector to visualize the keypoints. Thanks! Sorry for any bother!

enyen commented 2 weeks ago

When inference=True, keypoints from left and right images are concatenated: https://github.com/enyen/KOVIS_VisualServo/blob/5ea1d9abe1a6c9c99fb3c2046e16251d9e57b10f/train_model.py#L69

You can seperate the enc and cvt in init_network():

- setattr(self, 'net_servo_{}'.format(net), torch.nn.Sequential(enc, cvt))
+ setattr(self, 'net_enc_{}'.format(net), enc)
+ setattr(self, 'net_cvt_{}'.format(net), cvt)

- getattr(self, 'net_servo_{}'.format(net)).eval()
+ getattr(self, 'net_enc_{}'.format(net)).eval()
+ getattr(self, 'net_cvt_{}'.format(net)).eval()

In _watch_servo():

- vec, speed = getattr(self, 'net_servo_{}'.format(action))([infraL.cuda(), infraR.cuda()])
+ kps = getattr(self, 'net_enc_{}'.format(action))([infraL.cuda(), infraR.cuda()])
+ vec, speed = getattr(self, 'net_cvt_{}'.format(action))(kps)
+ kpl, kpr = kps.squeeze().chunk(chunks=2, dim=1)
+ vis_kp(kpl, kpr)  # todo

Keypoints are flatten: https://github.com/enyen/KOVIS_VisualServo/blob/5ea1d9abe1a6c9c99fb3c2046e16251d9e57b10f/train_model.py#L131 into [h1, w1, m1, ..., hK, wK, mK]

HRItdy commented 1 week ago

Really thanks for your reply! Yes the output is the result after I separated the enc and cvt. Sorry I have one more question about the output. What do h, w, m represent? In the paper the output of the keypointer is a 32*32 feature map with K channels, but in the code the output is simplified to the coordination of keypoints hi, wi. Do I understand correctly? Thanks!

enyen commented 6 days ago

h and w make up a points at the first and second axis of the 2D feature map; m is the magnitude or intensity of the point. https://github.com/enyen/KOVIS_VisualServo/blob/5ea1d9abe1a6c9c99fb3c2046e16251d9e57b10f/train_model.py#L114