DavidPL1 / assembly_example

Example code for interaction with our assembly simulation ICRA 2023 challenge
7 stars 1 forks source link

mujoco_server process seemingly crashes when running with docker-compose #27

Closed abhishek47kashyap closed 1 year ago

abhishek47kashyap commented 1 year ago

When I run the server image docker run --rm --net=host -it s4dx/assembly_server:latest and my "solution" image docker run --rm --net=host -it assembly_screwing_solution:latest in two separate terminals, I've no issues.

If however I try running them through docker-compose.screwing.yml using docker-compose -f docker-compose.screwing.yml up, I'm finding the mujoco_server process to seemingly die. Here are the logs with lines before and after:

server_1    | [ INFO] [1683280229.500731626] [ros.franka_mujoco.franka_hw_sim] [/mujoco_server]: Found transmission interface of joint panda_joint5 : hardware_interface/VelocityJointInterface
server_1    | [ INFO] [1683280229.523359304] [ros.franka_mujoco.franka_hw_sim] [/mujoco_server]: Found transmission interface of joint panda_joint6 : hardware_interface/VelocityJointInterface
server_1    | [ INFO] [1683280229.544401983] [ros.franka_mujoco.franka_hw_sim] [/mujoco_server]: Found transmission interface of joint panda_joint7 : hardware_interface/VelocityJointInterface
server_1    | [ INFO] [1683280229.566512424] [ros.franka_mujoco.franka_hw_sim] [/mujoco_server]: Found transmission interface of joint panda_joint1 : hardware_interface/EffortJointInterface
server_1    | [ INFO] [1683280229.566549706] [ros.franka_mujoco.franka_hw_sim] [/mujoco_server]: Found transmission interface of joint panda_joint2 : hardware_interface/EffortJointInterface
server_1    | [ INFO] [1683280229.566575424] [ros.franka_mujoco.franka_hw_sim] [/mujoco_server]: Found transmission interface of joint panda_joint3 : hardware_interface/EffortJointInterface
server_1    | [ INFO] [1683280229.566590148] [ros.franka_mujoco.franka_hw_sim] [/mujoco_server]: Found transmission interface of joint panda_joint4 : hardware_interface/EffortJointInterface
server_1    | [ INF
server_1    | ROS_MASTER_URI=http://localhost:11311
server_1    | process[mujoco_server-1]: started with pid [94]
server_1    | process[panda_gripper_spawner-2]: started with pid [95]
server_1    | process[panda_controller_spawner-3]: started with pid [96]
server_1    | process[robot_state_publisher-4]: started with pid [97]
server_1    | process[joint_state_publisher-5]: started with pid [102]
server_1    | process[virtual_joint_broadcaster_0-6]: started with pid [109]
server_1    | process[move_group-7]: started with pid [110]
server_1    | [mujoco_server-1] process has died [pid 94, exit code -11, cmd /home/assembly_server/lib/mujoco_ros/mujoco_server --admin-hash rezylbrdgejdzomnclab __name:=mujoco_server __log:=/root/.ros/log/40d3eec0-eb2a-11ed-8885-0242ac120002/mujoco_server-1.log].
server_1    | log file: /root/.ros/log/40d3eec0-eb2a-11ed-8885-0242ac120002/mujoco_server-1*.log
server_1    | Traceback (most recent call last):
server_1    |   File "/opt/ros/noetic/lib/python3/dist-packages/rospy/impl/tcpros_base.py", line 569, in connect
server_1    |     self.read_header()
server_1    |   File "/opt/ros/noetic/lib/python3/dist-packages/rospy/impl/tcpros_base.py", line 664, in read_header
server_1    |     self._validate_header(read_ros_handshake_header(sock, self.read_buff, self.protocol.buff_size))
server_1    |   File "/opt/ros/noetic/lib/python3/dist-packages/rosgraph/network.py", line 357, in read_ros_handshake_header
server_1    |     d = sock.recv(buff_size)
server_1    | ConnectionResetError: [Errno 104] Connection reset by peer
server_1    | 
server_1    | During handling of the above exception, another exception occurred:
server_1    | 
server_1    | Traceback (most recent call last):
server_1    |   File "/opt/ros/noetic/lib/python3/dist-packages/rospy/impl/tcpros_service.py", line 509, in call
server_1    |     transport.connect(dest_addr, dest_port, service_uri)
server_1    |   File "/opt/ros/noetic/lib/python3/dist-packages/rospy/impl/tcpros_base.py", line 596, in connect
server_1    |     raise TransportInitError(str(e)) #re-raise i/o error
server_1    | rospy.exceptions.TransportInitError: [Errno 104] Connection reset by peer
server_1    | 
server_1    | During handling of the above exception, another exception occurred:
server_1    | 
server_1    | Traceback (most recent call last):
server_1    |   File "/home/assembly_server/lib/assembly_manager/assembly_manager", line 5, in <module>
server_1    |     main()
server_1    |   File "<frozen assembly_manager>", line 38, in main
server_1    |   File "<frozen assembly_manager.screwing.screw_assembly>", line 559, in start
server_1    |   File "<frozen assembly_manager.screwing.screw_assembly>", line 453, in wait_for_sim
server_1    |   File "/opt/ros/noetic/lib/python3/dist-packages/rospy/impl/tcpros_service.py", line 442, in __call__
server_1    |     return self.call(*args, **kwds)
server_1    |   File "/opt/ros/noetic/lib/python3/dist-packages/rospy/impl/tcpros_service.py", line 512, in call
server_1    |     raise ServiceException("unable to connect to service: %s"%e)
server_1    | rospy.service.ServiceException: unable to connect to service: [Errno 104] Connection reset by peer
server_1    | [INFO] [1683280226.920564, 0.000000]: Waiting for /clock to be available...
server_1    | Traceback (most recent call last):
server_1    |   File "/opt/ros/noetic/lib/controller_manager/spawner", line 219, in <module>
server_1    |     if __name__ == '__main__': main()
server_1    |   File "/opt/ros/noetic/lib/controller_manager/spawner", line 123, in main
server_1    |     rospy.sleep(0.2)
server_1    |   File "/opt/ros/noetic/lib/python3/dist-packages/rospy/timer.py", line 165, in sleep
server_1    |     raise rospy.exceptions.ROSInterruptException("ROS shutdown request")
server_1    | rospy.exceptions.ROSInterruptException: ROS shutdown request
server_1    | Traceback (most recent call last):
server_1    |   File "/opt/ros/noetic/lib/controller_manager/spawner", line 219, in <module>
server_1    |     if __name__ == '__main__': main()
server_1    |   File "/opt/ros/noetic/lib/controller_manager/spawner", line 123, in main
server_1    |     rospy.sleep(0.2)
server_1    |   File "/opt/ros/noetic/lib/python3/dist-packages/rospy/timer.py", line 165, in sleep
server_1    |     raise rospy.exceptions.ROSInterruptException("ROS shutdown request")
server_1    | rospy.exceptions.ROSInterruptException: ROS shutdown request
server_1    | 6.850477202]: About to load: pilz_industrial_motion_planner::PlanningContextLoaderCIRC
server_1    | [ INFO] [1683280226.853116578]: Registered Algorithm [CIRC]
server_1    | [ INFO] [1683280226.853156033]: About to load: pilz_industrial_motion_planner::PlanningContextLoaderLIN
server_1    | [ INFO] [1683280226.855748723]: Registered Algorithm [LIN]
server_1    | [ INFO] [1683280226.855952604]: About to load: pilz_industrial_motion_planner::PlanningContextLoaderPTP
server_1    | [ INFO] [1683280226.858287253]: Registered Algorithm [PTP]
server_1    | [ INFO] [1683280226.858532787]: Using planning interface 'Pilz Industrial Motion Planner'

My docker-compose.screwing.yml is the same as the example, with the only difference being usercode's image pointing to my solution image assembly_screwing_solution:latest.

Could I get some pointers on what could I be doing wrong? Thanks!

fpatzelt commented 1 year ago

There is a problem with using headless mode on the server. I removed headless:=true from the 'docker-compose.*.yml' to circumvents this issue (6c9c628).

abhishek47kashyap commented 1 year ago

This resolved the problem, thank you.