bluerobotics / BlueOS

The open source platform for ROV, USV, robotic system operation, development, and expansion.
https://blueos.cloud/docs/
Other
146 stars 72 forks source link

bug: >= 1.3.0-beta.4 unable to start Navigator autopilot on 64-bit Bullseye Pi4 #2817

Open amarburg opened 2 months ago

amarburg commented 2 months ago

Bug description

Note: For testing purposes my hardware consists of a Pi4 8M and Navigator, with no other ROV hardware (motors, sensors, camera, tether) attached. "Works" below means basically "can get info from Navigator's built-in sensors over MavLink" and doesn't imply anything about the full ROV stack, controlling motors, QGC, etc.

For a variety of reasons, we need to run a full 64-bit base OS. After a flail with Bookworm, we reverted to Raspian "Bullseye" 64-bit lite. From a clean Raspbian install we installed BlueOS per the bootstrap install instructions.

We discovered newer releases of blueos-core could not properly initialize the autopilot.

On master, factory, 1.3.0-beta.7 and 1.3.0-beta.6 it identifies the Navigator but cannot get metainformation / cannot start the driver:

image

On 1.3.0-beta.4 it cannot identify the Navigator:

image

On 1.3.0-beta.2, 1.3.0-beta.1 and 1.2.6 is works as expected:

image

Steps to reproduce

  1. Assemble stack with Pi4 and Navigator
  2. Create "Bullseye 64-bit Lite" SD card with RasPi Imager
  3. Boot and run BlueOS installation script:
sudo su -c 'curl -fsSL https://raw.githubusercontent.com/bluerobotics/blueos-docker/master/install/install.sh | bash'
  1. Skip setup wizard.
  2. In "Autopilot Firmware" pane, Switch Board to "Navigator." Default factory image exhibits behavior shown above (recognizes Navigator but no meta-information)
  3. Use BlueOS Version pane to switch to e.g. 1.2.6, allow BlueOS to restart.
    1. In "Autopilot Firmwarepane, Switch Board to "Navigator." UseFirmware Updatepane to install current stableSub` firmware. After start, pane displays correct metainformation.

Primary pain point(s)

Unable to use recent 1.3.0-x release with this hardware.

Additional context

Happy to help debug, provide additional system logs as needed.

Prerequisites

joaoantoniocardoso commented 2 months ago

Thanks for reporting, I am having the same issue here, but directly from the BlueOS image (32 bits).

joaoantoniocardoso commented 2 months ago

@amarburg my issue seems to be another one, just the symptom in the frontend is the same.

Can you upload your system logs?

amarburg commented 2 months ago

Sorry, first time uploading logs. What's the most appropriate format --- zip of full /var/logs/blueos/... ? Or is there a particular log (ardupilot_manager) of interest.?

Williangalvani commented 2 months ago

hi @amarburg

you probably need to build a 64bit ardupilot binary. the firmware server doesn't build them yet.

amarburg commented 2 months ago

you probably need to build a 64bit ardupilot binary. the firmware server doesn't build them yet.

I'm curious why the older versions appear to work (up to the point of getting Mavlink messages on QGC)

Williangalvani commented 2 months ago

? Or is there a particular log (ardupilot_manager) of interest.?

yes that is the one.

I'm curious why the older versions appear to work (up to the point of getting Mavlink messages on QGC)

ok that is curious. Let's look at the logs =]

amarburg commented 2 months ago

Attached three ardupilot_manager logs, for:

tag_1.2.6-logfile.log tag_1.3.0-beta.2.0-logfile.log tag_1.3.0-beta.7.0-logfile.log

patrickelectric commented 2 months ago

@amarburg interesting, if you take a look in the change logs, you'll see that the only change in the ardupilot manager was a fix for aarch: https://github.com/bluerobotics/BlueOS/pull/2615/files

Are you sure that the problems in in beta.4 ? Are you willing to do further tests if we add more information in the logs ?

amarburg commented 2 months ago

So far, the behavior I describe above is repeatable with beta.4 not finding the Navigator, and later versions correctly call out the Navigator but are unable to communicate with it.

Yes, very happy to do further tests!

Williangalvani commented 2 months ago

I wonder if our change of blueos-base from bullseye to bookworm could be related to the issue, too. @amarburg unfortunately our logs dont include data from ardupilot itself, so we can't tell what the error message is. Could you take a look at the autipilot screen? (ctrl+b then s) and check what it shows after it prints the command line for the autopilot? or just kill the process with ctrl+c and manually run the ardusub binary:

/root/.config/ardupilot-manager/firmware/ardupilot_navigator -A udp:127.0.0.1:8852 --log-directory /root/.config/ardupilot-manager/firmware/logs/ --storage-directory /root/.config/ardupilot-manager/firmware/storage/ -C /dev/ttyS0 -B /dev/ttyAMA1 -E /dev/ttyAMA2 -F /dev/ttyAMA3 -D udpin:0.0.0.0:14666 --defaults /usr/blueos/userdata/firmware/ardupilot_navigatorparams.params
amarburg commented 1 month ago

Testing with beta.8 (same behavior as beta.7 described above):

2024-07-23 18:22:58.240 | INFO     | ArduPilotManager:start_ardupilot:523 - Using Navigator flight-controller.
2024-07-23 18:22:58.269 | INFO     | ArduPilotManager:start_linux_board:240 - Using command line: '/root/.config/ardupilot-manager/firmware/ardupilot_navigator -A udp:127.0.0.1:8852 --log-directory /root/.config/ardupilot-manager/firmware/logs/ --storage-directory /root/.config/ardupilot-manager/firmware/storage/ -C /dev/ttyS0 -B /dev/ttyAMA1 -E /dev/ttyAMA2 -F /dev/ttyAMA3 --defaults /usr/blueos/userdata/firmware/ardupilot_navigatorparams.params'
2024-07-23 18:22:58.274 | DEBUG    | mavlink_proxy.AbstractRouter:start:99 - Calling router using following command: '/usr/bin/mavlink-routerd 127.0.0.1:8852 0.0.0.0:14660 --endpoint 127.0.0.1:14000 --tcp-port 5777 --tcp-port 14755 -l /root/.config/ardupilot-manager/logs -T /root/.config/ardupilot-manager/logs'.
/bin/sh: 1: /root/.config/ardupilot-manager/firmware/ardupilot_navigator: not found

The ardupilot_navigator binary is ARM 32-bit.

Under the working "1.2.6" core, is it also ARM 32-bit.

So something in the binary cross-compatibility is working a 1.2.6 but not in 1.3.0?