ezWheelSAS / swd_ros_controllers

ROS nodes to control motors powered by the ez-Wheel Safety Wheel Drive (SWD®) technology.
https://www.ez-wheel.com/
GNU Lesser General Public License v2.1
5 stars 4 forks source link

Starter Kit fails upon start #76

Closed hazemy closed 1 year ago

hazemy commented 1 year ago

Hello, The starter Kit fails to start any ezw services upon booting. When running systemctl -a | grep ezw, I get

System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down

The SWD cores keep blinking (left motor at slower rate) and the ROS master is not started. consequently, roscore does not start and is terminated when attempting to restart.

hazemy commented 1 year ago

It seems that systemd is running only outside the docker container on the starter kit. However, I can see all the relevant services (ezw-stack.service, ezw-swd-left.service, etc) residing on /etc/systemd/system within the docker container. Attemting to restart any of these services results in the same error as in the previous comment. I am not sure what the default behavior should be.

ez-Support commented 1 year ago

Yes actually services run inside the docker, you have to use supervisorctl instead of systemctl.

You can see the docker running using: docker ps -a Then, you can execute and supervise services using:

docker exec -u swd_sk -ti ros-noetic /bin/bash
sudo supervisorctl start/stop/status
ez-Support commented 1 year ago

If you modify the docker image, you might have to rebuild it, using: ./rebuild.sh -w ~ -u

hazemy commented 1 year ago

I successfully rebuilt the docker image from scratch (without cache). However, I am still unable to start roscore and get random shutdowns for the bringup launch file. I am doubting that ros might be improperly installed. Could you please provide the Dockerfile and the entrypoint.sh file? Are there any further configurations needed before/after building the image?

GMezWheel commented 1 year ago

Hi @hazemy,

Can you send me the starting docker logs ? docker stop <docker_name> && docker start <docker_name> && docker logs -f <docker_name>

hazemy commented 1 year ago

Hi @GMezWheel,

Thanks for answering. The following is a portion of the logs:

Entrypoint script is running...

Create user swd_sk...

Adding group `swd_sk' (GID 1001) ...
Done.
useradd: warning: the home directory /home/swd_sk already exists.
useradd: Not copying any file from skel directory into it.
User 'swd_sk' is added
Create dbus session file...

[ -s /.dockername ] && source /opt/ezw/install/setup.bash
Start supervisor services...

2023-08-22 15:14:33,405 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file to avoid this message.
2023-08-22 15:14:33,406 INFO Included extra file "/etc/supervisor/conf.d/ezw-swd.conf" during parsing
2023-08-22 15:14:33,406 INFO Included extra file "/etc/supervisor/conf.d/nginx.conf" during parsing
2023-08-22 15:14:33,407 INFO Included extra file "/etc/supervisor/conf.d/supervisord.conf" during parsing
2023-08-22 15:14:33,443 INFO RPC interface 'supervisor' initialized
2023-08-22 15:14:33,444 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2023-08-22 15:14:33,446 INFO supervisord started with pid 69
2023-08-22 15:14:34,463 INFO spawned: 'ezw-swd-left.service' with pid 71
2023-08-22 15:14:34,470 INFO spawned: 'ezw-swd-right.service' with pid 72
2023-08-22 15:14:34,476 INFO spawned: 'swd-starter-kit-bringup.service' with pid 73
2023-08-22 15:14:34,485 INFO spawned: 'nginx' with pid 74
2023-08-22 15:14:36,096 INFO success: ezw-swd-left.service entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-08-22 15:14:36,097 INFO success: ezw-swd-right.service entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-08-22 15:14:36,098 INFO success: swd-starter-kit-bringup.service entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-08-22 15:14:36,099 INFO success: nginx entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-08-22 15:14:43,073 INFO exited: swd-starter-kit-bringup.service (exit status 1; not expected)
2023-08-22 15:14:44,080 INFO spawned: 'swd-starter-kit-bringup.service' with pid 204
2023-08-22 15:14:45,498 INFO success: swd-starter-kit-bringup.service entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-08-22 15:14:50,152 INFO exited: swd-starter-kit-bringup.service (exit status 1; not expected)
2023-08-22 15:14:51,159 INFO spawned: 'swd-starter-kit-bringup.service' with pid 260
2023-08-22 15:14:52,552 INFO success: swd-starter-kit-bringup.service entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-08-22 15:14:57,280 INFO exited: swd-starter-kit-bringup.service (exit status 1; not expected)
2023-08-22 15:14:58,286 INFO spawned: 'swd-starter-kit-bringup.service' with pid 316
2023-08-22 15:14:59,713 INFO success: swd-starter-kit-bringup.service entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-08-22 15:15:04,283 INFO exited: swd-starter-kit-bringup.service (exit status 1; not expected)
2023-08-22 15:15:05,289 INFO spawned: 'swd-starter-kit-bringup.service' with pid 372
2023-08-22 15:15:06,705 INFO success: swd-starter-kit-bringup.service entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-08-22 15:15:11,371 INFO exited: swd-starter-kit-bringup.service (exit status 1; not expected)
2023-08-22 15:15:12,377 INFO spawned: 'swd-starter-kit-bringup.service' with pid 428
2023-08-22 15:15:13,812 INFO success: swd-starter-kit-bringup.service entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-08-22 15:15:18,538 INFO exited: swd-starter-kit-bringup.service (exit status 1; not expected)
2023-08-22 15:15:19,545 INFO spawned: 'swd-starter-kit-bringup.service' with pid 484
2023-08-22 15:15:20,954 INFO success: swd-starter-kit-bringup.service entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-08-22 15:15:25,672 INFO exited: swd-starter-kit-bringup.service (exit status 1; not expected)
2023-08-22 15:15:26,679 INFO spawned: 'swd-starter-kit-bringup.service' with pid 540
2023-08-22 15:15:28,123 INFO success: swd-starter-kit-bringup.service entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-08-22 15:15:32,750 INFO exited: swd-starter-kit-bringup.service (exit status 1; not expected)
2023-08-22 15:15:33,757 INFO spawned: 'swd-starter-kit-bringup.service' with pid 596
2023-08-22 15:15:35,182 INFO success: swd-starter-kit-bringup.service entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-08-22 15:15:39,827 INFO exited: swd-starter-kit-bringup.service (exit status 1; not expected)
2023-08-22 15:15:40,833 INFO spawned: 'swd-starter-kit-bringup.service' with pid 652
2023-08-22 15:15:42,242 INFO success: swd-starter-kit-bringup.service entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-08-22 15:15:46,832 INFO exited: swd-starter-kit-bringup.service (exit status 1; not expected)

The log lines are rather long and keep on repeating for the bringup service.

GMezWheel commented 1 year ago

Indeed, swd-starter-kit-bringup service does not start.

Now, can you send me the log of swd-starter-kit-bringup ? cat /var/log/supervisor/*bringup.service.log

hazemy commented 1 year ago

The logs repeat the following:

[SUCCESS] source /opt/ros/noetic/setup.bash
[SUCCESS] source /home/swd_sk/ros-noetic_ws/install/setup.bash
WARN: unrecognized 'remap' tag in <include> tag
Resource not found: swd_starter_kit_description
ROS path [0]=/opt/ros/noetic/share/ros
ROS path [1]=/home/swd_sk/ros-noetic_ws/install/share
ROS path [2]=/opt/ros/noetic/share
The traceback for the exception was written to the log file
... logging to /home/swd_sk/.ros/log/9c605966-40fe-11ee-88be-0001c032d80d/roslaunch-SWDSK32D80D-195.log
Checking log directory for disk usage. This may take a while.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

[SUCCESS] source /opt/ros/noetic/setup.bash
[SUCCESS] source /home/swd_sk/ros-noetic_ws/install/setup.bash
WARN: unrecognized 'remap' tag in <include> tag
Resource not found: swd_starter_kit_description
ROS path [0]=/opt/ros/noetic/share/ros
ROS path [1]=/home/swd_sk/ros-noetic_ws/install/share
ROS path [2]=/opt/ros/noetic/share
The traceback for the exception was written to the log file
... logging to /home/swd_sk/.ros/log/a1bac78e-40fe-11ee-80b7-0001c032d80d/roslaunch-SWDSK32D80D-255.log
Checking log directory for disk usage. This may take a while.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

I cloned the bringup and robot_description repos and tried to build them. However, catkin_make is killed before the compilation is done

Base path: /home/swd_sk/ros-noetic_ws
Source space: /home/swd_sk/ros-noetic_ws/src
Build space: /home/swd_sk/ros-noetic_ws/build
Devel space: /home/swd_sk/ros-noetic_ws/devel
Install space: /home/swd_sk/ros-noetic_ws/install
####
#### Running command: "cmake /home/swd_sk/ros-noetic_ws/src -DCATKIN_DEVEL_PREFIX=/home/swd_sk/ros-noetic_ws/devel -DCMAKE_INSTALL_PREFIX=/home/swd_sk/ros-noetic_ws/install -G Unix Makefiles" in "/home/swd_sk/ros-noetic_ws/build"
####
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Using CATKIN_DEVEL_PREFIX: /home/swd_sk/ros-noetic_ws/devel
-- Using CMAKE_PREFIX_PATH: /home/swd_sk/ros-noetic_ws/install;/opt/ros/noetic
-- This workspace overlays: /home/swd_sk/ros-noetic_ws/install;/opt/ros/noetic
-- Found PythonInterp: /usr/bin/python3 (found suitable version "3.8.10", minimum required is "3") 
-- Using PYTHON_EXECUTABLE: /usr/bin/python3
-- Using Debian Python package layout
-- Found PY_em: /usr/lib/python3/dist-packages/em.py  
-- Using empy: /usr/lib/python3/dist-packages/em.py
Killed

I deleted build and devel directories, but the build still fails. Another question, is the ROS workspace shared by the docker container and the host?

GMezWheel commented 1 year ago

In the present case, 'supervisor' try to restart the bringup each time it failed and kill the ROS stack.

Thus, you must stop supervisor before compiling or launching the bringup manually sudo supervisorctl stop swd-starter-kit-bringup.service

To kill all the ROS nodes: /opt/ezw/sbin/sce-ros-bringup.sh stop

hazemy commented 1 year ago

Stopping the supervisor worked! I compiled the packages successfully and was able to maually start the bringup file. It seems that the root cause of the problem is a that the urdf file is not loaded before the state publisher requests it and thus the bringup launch file fails. I can open a pull request for the fix. I have a few questions however: 1- It is still unclear to me where the workspace environment variables are being sourced on startup. Any modifications to the startup behavior in the bringup launch file is not reflected upon restarting the starterkit. So, how is it modifiable? 2- Is the ROS workspace itself a shared volume between the container and the host? Changes on any of them is reflected on the other. 3- I cannot see the bringup repo being cloned by the docker file, is that done independently for each kit before building the image?

Thanks a lot!

GMezWheel commented 1 year ago

Stopping the supervisor worked! I compiled the packages successfully and was able to maually start the bringup file. It seems that the root cause of the problem is a that the urdf file is not loaded before the state publisher requests it and thus the bringup launch file fails. I can open a pull request for the fix. Yes of course I have a few questions however: 1- It is still unclear to me where the workspace environment variables are being sourced on startup. Any modifications to the startup behavior in the bringup launch file is not reflected upon restarting the starterkit. So, how is it modifiable? Look inside your ~/.bashrc file => source /opt/ezw/install/setup.bash 2- Is the ROS workspace itself a shared volume between the container and the host? Changes on any of them is reflected on the other. Yes the ROS workspace is shared between the container and the host => ~/ros-noetic_ws 3- I cannot see the bringup repo being cloned by the docker file, is that done independently for each kit before building the image? Indeed, the bringup installation is missing into the docker => it has been done during the building image I had it into the backlog

Thanks a lot!

hazemy commented 1 year ago

Thanks a lot for the clarification!