prius.sdf does not match prius.urdf

RussTedrake commented 8 years ago

In #1711. i'd like for the dynamics to match exactly, so that we can use this as another unit test. But right now, in addition to not matching the prius.sdf model actually goes unstable when used for simulation. So there is something fundamentally wrong with either the sdf or the parser. This one is urgent. :)

liangfok commented 8 years ago

Thanks for the advice! I will keep it in mind. In this case the test program intentionally injects random inputs into the system one thousand times in the hopes of catching differences between the two models.

liangfok commented 8 years ago

I've narrowed down the problem to two areas:

The body link was not attached to the chassis_floor link the same way in both models. The result was different inertia matrices between the two models (I should have seen that in the visualizations of the RigidBodyTrees above). Fixed in https://github.com/liangfok/drake/commit/5ee2ae5a91b52af8461791403386ade11739d623.
loop joint - Removing the loop joints from the models results in them passing the equivalence test. Adding them back results in the test failing. I'm still looking into this...

sherm1 commented 8 years ago

loop joints

Random comment unrelated to your troubles, Liang: often the best way to deal with loops in an internal-coordinate multibody system is to split within a link rather than at the joint. This has the substantial advantage that all joints are then treated uniformly and angular coordinates can wrap past 2π without introducing extra state variables or hacks. It requires sawing a link into two half-mass bodies and gluing them back together with a weld (fixed) constraint (6 equations). This is a topic for another conversation but I thought I'd plant a seed now.

RussTedrake commented 8 years ago

@sherm - very nice. That would indeed be cleaner. The current implementation is limited in that we cannot easily add an actuator directly to the “loop joint”, nor friction, etc. And we’re already adding 6 constraints for a revolute joint anyhow (should be 5, but was cleaner to implement as 6). Would you open a new issue on that? It would be a definite improvement!

sherm1 commented 8 years ago

Loop joint discussion moved to #1867.

liangfok commented 8 years ago

I'm still digging into why the addition of loop joints is causing the URDF vs. SDF models of the Prius to be different. Here's what I've found so far:

The RigidBodyTree visualizations between the two models are extremely close:

SDF:

r1 dot

URDF:

r2 dot

The difference is arising from the call to RigidbodySystem::dynamics(). Within this method, the differences do not arise until the call to prog.solve(). This method is defined in LeastSquares::solve(). Within this method, there is an Aeq matrix and a beq vector. The least squares solution is computed by the following code:

      // least-squares solution
      prog.setDecisionVariableValues(
          Aeq.jacobiSvd(Eigen::ComputeThinU | Eigen::ComputeThinV).solve(beq));)

Using print statements, I determined that Aeq are the same but beq is different between the two models. The table below contains an example difference between the beq vector:

Index	URDF	SDF
0	30.19623666357765401357938	30.19628053363987518764588
1	-14.02833576585239505618574	-14.02839659691058571411304
2	1.110223024625156540423632e-14	8.326672684688674053177238e-15
3	30.19623666357765401357938	30.19628053363987518764588
4	-14.02833576585239505618574	-14.02839659691058571411304
5	1.665334536937734810635448e-14	1.38777878078144567552954e-14
6	2.441663517175129527458921	2.441432350830609721015207
7	76.73089804700674676496419	76.73260586144398587293836
8	5.462316429263069039734546	5.46275105161032570322277
9	-870.3200313010531772306422	-870.3200313010531772306422
10	84.20119268094822473358363	84.20119268094822473358363
11	-793.7666223647498782156617	-793.7666223647498782156617
12	64216.42927706049522385001	64216.4299376551789464429
13	-65183.48699485358520178124	-0
14	-0.009111866824832448585880229	-0.009111866824846659440595431
15	-77390.85823135306418407708	-77390.85845732540474273264
16	0.5209504568581166950025363	0.5209504568581166950025363
17	-2.131628207280300557613373e-14	-0
18	1.68753899743023794144392e-14	-2.664535259100375697016716e-15

In the above example, the problem appears to be with index 13 of the URDF's beq vector. Currently looking into why.

sherm1 commented 8 years ago

Liang, a thought: this looks like a difference in rank determination by the SVD. The sdf version appears to have decided on a lower rank for what I presume is the constraint compliance matrix (is that right?). From the pictures it looks like the car model contains a planar linkage, which if modeled in 3d should have redundant out-of-plane constraints. I notice that there are some subtle differences in some of the other numbers, for example row 0 differs in the sixth decimal place. To me that suggests possible float vs. double treatment differences. The float version (urdf?) could have errors of 1e-6 or so which could prevent the SVD from dropping rank when it should.

RussTedrake commented 8 years ago

It may help to also numerically check the individual properties (inertia, transforms, etc) to catch differences before getting to solve().

liangfok commented 8 years ago

@sherm1, to determine the SVD rank, I can count the number of non-zero singular values right? If so, I believe the ranks of both models are the same. Here are example SVD singular values obtained using Aeq.jacobiSvd(Eigen::ComputeThinU | Eigen::ComputeThinV).singularValues() from one execution of the rigid body model equivalency test that failed.

Index	URDF	SDF
0	243.6518650177665392675408	243.6530916319920834212098
1	210.9897112899952844600193	210.9906242300023393454467
2	120.4271656785780066911684	120.4271691118260037001164
3	120.1792493242776487250012	120.1792320269771039420448
4	119.5581933039784559014151	119.5581558735103726576199
5	34.2997489187578494806985	34.3000281990250641683815
6	3.052668011736161535196743	3.052718464389475361286941
7	2.106752996491387364130787	2.106747390436160660698306
8	1.425061883264294726458843	1.425096408193961439181408
9	0.7637255782890202127077828	0.7637495038471329156237744
10	0.4999120000000041863508216	0.4999120000000024099939822
11	0.4999120000000031316389482	0.4999119999999998564810255
12	0.4971264118704067369591826	0.4971264230915271031818747
13	0.4875517614159221024827673	0.4875518639757789873812044
14	0.04451646622718740536894799	0.04451400417051754043740175
15	3.719182395161172259100363e-21	2.319157834412946368918493e-21
16	1.420089482289595853461653e-23	1.263129089448864027256328e-24
17	4.728979668985963391983973e-32	6.281537206627940716786068e-31
18	5.007775934575039637495072e-33	1.884705025915192462402635e-32

I believe the slight differences between the URDF and SDF models are due to the need to convert from absolute to relative coordinate frames when parsing the SDF. Both the SDF and URDF parsers use the double data type.

@RussTedrake, I will numerically check for differences probably by modifying RigidBodyTree to override the == operator.

Thanks for the advice!

sherm1 commented 8 years ago

to determine the SVD rank, I can count the number of non-zero singular values right? If so, I believe the ranks of both models are the same.

Yes, those are both convincingly rank 15 matrices!

RussTedrake commented 8 years ago

I would not be surprised if the numerical differences are from the ascii representation of the doubles in the xml files, especially if the sdf was generated from a tool that loaded the urdf into a float/double then wrote the sdf with some potentially different accuracy.

liangfok commented 8 years ago

I overloaded operator== and operator!= in RigidBodyTree, RigidBody, and RigidBodyFrame and added a numerical equivalency test to compareRigidBodySystems.cpp. Here is the commit: https://github.com/liangfok/drake/commit/ba9027b6c8bfe66667e1055852a2a2481a7a45c3. The numerical equivalency test currently checks everything (inertia, COM, link names, joint names, joint limits, contact points, loop frames, actuators, gravity vector, B matrix, etc.) except for the collision model.

The URDF and SDF models of the Prius pass the current numerical equivalency test. Maybe this will change after comparing the collision models.

RussTedrake commented 8 years ago

Are the dynamics the same if you put the initial conditions with the car up in the air? (e.g. set z=10)? In that case, the collision geometries should not effect the dynamics.

liangfok commented 8 years ago

I think I initialized both models 10m up in the air by modifying x[2], which is index 2 of the initial state vector (see: https://github.com/liangfok/drake/commit/f867f2d0fa5f2ad0824b39e00628e1608392891c). I decided to use index 2 based on the output of a method I added called RigidBodySystem::getStateVectorSemantics(), which I hope prints the correct order of elements in the state vector (see: https://github.com/liangfok/drake/commit/caab2ae6fe5cbc5f4afe55202e25a64c5b706bc2). Here is the output for both models:

Index	Joint name	Joint axis-of-motion name
0	base	base_x
1	base	base_y
2	base	base_z
3	base	base_qw
4	base	base_qx
5	base	base_qy
6	base	base_qz
7	steering	steering
8	left_pin	left_pin
9	left_wheel_joint	left_wheel_joint
10	axle_tie_rod_arm	axle_tie_rod_arm
11	right_wheel_joint	right_wheel_joint
12	rear_left_wheel_joint	rear_left_wheel_joint
13	rear_right_wheel_joint	rear_right_wheel_joint
14	base	base_wx
15	base	base_wy
16	base	base_wz
17	base	base_vx
18	base	base_vy
19	base	base_vz
20	steering	steeringdot
21	left_pin	left_pindot
22	left_wheel_joint	left_wheel_jointdot
23	axle_tie_rod_arm	axle_tie_rod_armdot
24	right_wheel_joint	right_wheel_jointdot
25	rear_left_wheel_joint	rear_left_wheel_jointdot
26	rear_right_wheel_joint	rear_right_wheel_jointdot

Unfortunately, the dynamics of the vehicles do not match. To see a larger difference, I increased the equivalency threshold of valuecheckMatrix() from 1e-8 to 1 (see: https://github.com/liangfok/drake/commit/bbc764e839b707fb234b12d3a5f76e34802eff61). Here is an example of a mismatch:

Initial condition (x):

 [-0.11178, -1.03367, 10, 0.196684,-0.371481,0.0390621, -0.85062, -225.333,  166.725, -104.719, 197.192, -212.579, -67.8423, -17.4616, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Input (u):

[0.994796, 0.832517, -0.767727]

Resulting xdot vectors and their differences (this is the return value of RigidBodySystem::dynamics()):

Index	SDF	URDF	Delta
0	0	0	0
1	0	0	0
2	0	0	0
3	0	0	0
4	0	0	0
5	0	0	0
6	0	0	0
7	0	0	0
8	0	0	0
9	0	0	0
10	0	0	0
11	0	0	0
12	0	0	0
13	0	0	0
14	154.407	154.414	-0.00644637
15	35.7116	35.7128	-0.00120793
16	783.281	783.295	-0.0140741
17	-271.505	-271.498	-0.00746939
18	1391.15	1391.18	-0.0295634
19	-14.7617	-14.7617	6.13917e-05
20	22892.6	22894.3	-1.73259
21	-32930.2	-32932.5	2.3233
22	95.563	95.5671	-0.00410449
23	-31217.4	-31218.4	1.06835
24	127.887	127.892	-0.00519207
25	-35.7116	-35.7128	0.00120793
26	-35.7116	-35.7128	0.00120793

I'm noticing some very large values in the xdot vectors especially given the relatively small values in u. Could we be hitting a singularity somewhere?

I also find strange the fact that all joint positions are zero in xdot. Is this expected?

liangfok commented 8 years ago

Quote from @sherm1:

From the pictures it looks like the car model contains a planar linkage, which if modeled in 3d should have redundant out-of-plane constraints.

I'm not sure if there are any planar linkages. Below is a zoomed-in view of the front steering links in both the URDF and SDF models. Notice the tie rod and axle are above the left and right tie rod arms. Is this where you were thinking there may be some planar linkages with redundant out-of-plane constraints?

screenshot from 2016-03-18 11 30 02

sherm1 commented 8 years ago

Is this where you were thinking there may be some planar linkages with redundant out-of-plane constraints?

Yes -- the shift in z direction would still leave this as a planar mechanism. So if the joints were all pin (revolute) joints there would be redundant constraints. That is, you could remove one of the pins and the mechanism would still move only in the x-y plane, so no additional z constraint was needed. Nothing wrong with that provided that the linear algebra being employed is prepared for it.

liangfok commented 8 years ago

@sherm1, thanks for the explanation. Now I understand what is meant by "planar mechanism"!

liangfok commented 8 years ago

I modified the RigidBodyTree numerical equivalency checker to also compare the collision models (see: https://github.com/liangfok/drake/commit/ff3b20cda4bf9515f2baab0222c392e4faca5fb7). It indicated that the two collision models were the same even though the resulting dynamics were different. This is consistent with the earlier test where the models were elevated 10m in the air to prevent collision geometries from impacting vehicle dynamics, and still observing differences in the dynamics. Thus, the mystery of why the URDF and SDF Prius models exhibit different dynamics continues.

To attain a greater level of repeatability during this debugging process, I hard-coded a known failure case (see: https://github.com/liangfok/drake/commit/6223ff833d634b7f920bec999f24257e181ac751) and will use it to further investigate why seemingly identical rigid body tree models result in different dynamics.

liangfok commented 8 years ago

I finally got the Prius URDF and SDF Models to match. Ultimately, the solution was to manually adjust the two models so that the error between them was less than std::numeric_limits<double>::epsilon(). There's an argument that the tolerance value used in compareRigidBodySystems.cpp should be relative rather than absolute (see https://github.com/RobotLocomotion/drake/issues/1887#issuecomment-198805278) -- I will leave that to a future issue. Below are some details of the fix.

https://github.com/liangfok/drake/commit/fa948a605a8a6706194e15a247bdc32e6b317596 - fixed bug in operator== overload that resulted in joints not being compared
https://github.com/liangfok/drake/commit/7bbd3e438d6defc2894af94139b4eaa4811179f7 - decreased the valueCheckMatrix(...) equivalency tolerance from 1e-10 to std::numeric_limits<double>::epsilon()
https://github.com/liangfok/drake/commit/c66a1db5b790ca961d7a9f56028ea849d6f494bb - painstakingly adjusted the models to numerically match with tolerance below std::numeric_limits<double>::epsilon()
https://github.com/liangfok/drake/commit/0e6e26f506f61f2fad19405fe9a172b8148a70ba - lowered vehicles from air and return overall tolerance threshold to 1e-8

TODO:

Cleanup the code
Issue pull request containing fixed URDF / SDF parsers and models.

RussTedrake commented 8 years ago

Great news!

I don't understand exactly what you mean by adjusting the pose manually? Could you explain at which step/file you mean?

liangfok commented 8 years ago

SDF contains <pose> tags that specify a link's position and orientation in the model frame and a joint's position and orientation in its child link's frame. I tuned them so the transforms in Drake's internal models match. Here is the specific commit: https://github.com/liangfok/drake/commit/c66a1db5b790ca961d7a9f56028ea849d6f494bb

RussTedrake commented 8 years ago

Hand-tuned, or computed more accurately?

liangfok commented 8 years ago

I printed the transforms with 25-digit precision and then used a calculator to compute the desired values to get a precise match.

I originally converted the SDF to URDF using an automated tool (http://wiki.ros.org/pysdf) but still had to make many changes to both the resulting URDF and original SDF to make them perfectly match.

We may want to develop our own URDF/SDF converter to fully automate the process in the future.

RussTedrake commented 8 years ago

Calculator? :)

Great work. Thanks for pushing it through to a satisfying conclusion. Looking forward to the PR.

liangfok commented 8 years ago

No problem! One problem with all the manual-tuning-using-a-calculator is the vehicle model is no longer symmetric across the sagittal plane. For example, below is the pose of the front left and right wheels:

Link	x	y	z	r
Right wheel	1.40948	-0.80198	0.3164819999999999855511135	1.57079632679
Left wheel	1.40948	0.801993	0.3164819999999999855511135	1.57079632679
Delta	0	1.3e-5	0	0

Since the models match and simulation works, I created a new issue tracking the work of modifying the models to be symmetric: https://github.com/RobotLocomotion/drake/issues/1896.

RobotLocomotion / drake

prius.sdf does not match prius.urdf #1714