Running on odroid XU4 - Githubissues

Changliu52 commented 7 years ago

It is really exciting to see this code released! Amazing! I am working with Odroid XU4. I am wondering if it is possible to run it in ARM CPUs directly? or any changes to be made for that?

Kind Regards, Chang

qintonguav commented 7 years ago

We haven't tried it on Odroid. We have used VINS_mono on TK1, TX1, TX2 successfully.

qintonguav commented 7 years ago

I think you can directly clone the code on Odroid and try it under the ROS environment. No more special dependency. Use fewer features, may be work.

Changliu52 commented 7 years ago

Awesome. I will definitely try that! Thank you for your reply. Chang

On 22 May 2017, at 15:19, QIN Tong notifications@github.com wrote:

I think you can directly clone the code on Odroid and try it under the ROS environment. No more special dependency. Use fewer features, may be work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Changliu52 commented 7 years ago

Nvidia TK TX is arm based. So should work. If you are not using GPU, they should have similar speed with odroid xu4.

On 22 May 2017, at 15:19, QIN Tong notifications@github.com wrote:

I think you can directly clone the code on Odroid and try it under the ROS environment. No more special dependency. Use fewer features, may be work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

qintonguav commented 7 years ago

No GPU. I suggest you turn off the loop closure function and limit feature number to 100 at beginning.

longbowlee commented 7 years ago

This is awesome, is this algorithm support rolling Shutter cam？

shaojie commented 7 years ago

Not now, but we will add rolling camera support soon. Stay tuned.

However, the code runs reasonably well with a rolling shutter camera, as shown in VINS-Mobile (iPhone camera is rolling shutter).

On 23 May 2017, at 11:41 AM, longbowlee notifications@github.com wrote:

This is awesome, is this algorithm support rolling Shutter cam？

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

longbowlee commented 7 years ago

@shaojie Thanks, Shaojie. I read part of your PHD dissertation, even I can't fully understand yet. Is this algorithm implemented as indicated in the dissertation, by non-linear optimization rather than filter-based?

maesfahani commented 7 years ago

Hi, I am trying to test it on TX1, but the tracking fails and says: "Number of consecutive invalid steps more than Solver::Options::max_num_consecutive_invalid_steps" I am using MH_01_easy.bag for test and I turned off the loop closure function and limited feature number to 100. However, it runs well on my laptop.

qintonguav commented 7 years ago

@maesfahani I have tested again VINS_mono on TX1 ( remove -march=native in CmakeList.txt). I didn't meet this problem. Maybe you environment problem. I can only share you with my environment. 3.10.96-tegra Eigen 3.2.9 Ceres 1.11.0 OpenCV 2.4.13 webwxgetmsgimg webwxgetmsgimg 1 Actually, we put TX1 on quadrotor to do onboard experiment.

qintonguav commented 7 years ago

@Changliu52 I have tested VINS_Mono on Odroid XU4 (remove -march=native in CmakeList.txt and close loop clusure function). The code can run on Odroid successfully. The performance is not good because of the limited image processing ability. Some time delay occurs. I didn't suggest you use Odroid to run vision-based algorithms. For TX1, it can achieve real-time performance. What's more, if you know how to use Visionworks, the powerful visual tools on Tegra, you can run faster.
webwxgetmsgimg 2 webwxgetmsgimg 3

maesfahani commented 7 years ago

@qintony is your ROS Kinectic or indigo on TX1? The ROS on my laptop is Indigo and it works fine.

qintonguav commented 7 years ago

@maesfahani indigo on TX1.

maesfahani commented 7 years ago

@qintony maybe that is the problem. I have ubuntu 16.04 and kinetic on TX1

maesfahani commented 7 years ago

@qintony did you test it on kinetic before?

qintonguav commented 7 years ago

@maesfahani No. Maybe you need to figure it out by yourself. Thanks.

Changliu52 commented 7 years ago

Interesting. Thank you so much Qin. You are right. Just found out a57 in tx1 is 15% faster than a15 in odroid xu4... https://www.arm.com/products/processors/cortex-a/cortex-a57-processor.php

Chang

Sent from my iPhone

On 23 May 2017, at 18:31, QIN Tong notifications@github.com wrote:

@Changliu52 I have tested VINS_Mono on Odroid XU4 (remove -march=native in CmakeList.txt and close loop clusure function). The code can run on Odroid successfully. The performance is not good because of the limited image processing ability. Some time delay occurs. I didn't suggest you use Odroid to run vision-based algorithms. For TX1, it can achieve real-time performance. What's more, if you know how to use Visionworks, the powerful visual tools on Tegra, you can run faster.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jeff-delaune commented 7 years ago

@qintony Confirming I could only run VINS-Mono on the XU4 by disabling loop closure. Image frames were then being processed at 8 fps on average on my data. When you said earlier "performance is not so good", can you confirm you had similar frame rate? Thanks much.

qintonguav commented 7 years ago

@fejj Yes, the image processing rate is lower than 10Hz. We need at least 10 Hz.

maciejmatuszak commented 7 years ago

Hi, First of all this is great framework! Well written and when works it is really good. I am trying to run the vins-mono on TX2 module but without success - position drift occurs on the TX2 and TX1. @qintony since you indicated you run it on TX2 can you please help with solving this. I encounter this problem first on TX1 with Jetson 3.0 I assume the reason for fail is same as on TX2. I run with maximum performance settings using jetson_clocks.sh.

My TX2 env:

Jetpack 3.0
Ununtu 16.04
ros kinetic from cmake: -- Found OpenCV: /opt/ros/kinetic (found version "3.2.0") -- Found installed version of Eigen: /usr/lib/cmake/eigen3 -- Found required Ceres dependency: Eigen version 3.2.92 in /usr/include/eigen3 -- Found Ceres version: 1.13.0 installed in: /usr/local with components: [LAPACK, SuiteSparse, SparseLinearAlgebraLibrary, CXSparse, SchurSpecializations, OpenMP]

I have stereo camera with synchronised IMU (loitor sensor) to narrow the problem I recorded bag file on the TX2. I can run the bag file on laptop without problem, with identical settings / library versions. . I notice there are debug statement in code so I run it in debug level. With all the debug processing it will fail on the laptop as well but I can see major difference in ceres solver performance. On TX2 I can get 2-3 iterations but on laptop it will reach the limit of 9 all the time. below is "representative" fragments of logs:

TX2:

[32m[DEBUG] [1500876235.516887783, 1500597636.677634330]: new image coming ------------------------------------------[0m
[32m[DEBUG] [1500876235.516928903, 1500597636.677634330]: Adding feature points 64[0m
[32m[DEBUG] [1500876235.516954310, 1500597636.677634330]: input feature: 64[0m
[32m[DEBUG] [1500876235.517000582, 1500597636.677634330]: num of feature: 83[0m
[32m[DEBUG] [1500876235.517095204, 1500597636.677634330]: parallax_sum: 3.985230, parallax_num: 41[0m
[32m[DEBUG] [1500876235.517125348, 1500597636.677634330]: current parallax: 44.712336[0m
[32m[DEBUG] [1500876235.517151460, 1500597636.677634330]: this frame is--------------------accept[0m
[32m[DEBUG] [1500876235.517174403, 1500597636.677634330]: Keyframe[0m
[32m[DEBUG] [1500876235.517196643, 1500597636.677634330]: Solving 10[0m
[32m[DEBUG] [1500876235.517224771, 1500597636.677634330]: number of feature: 83[0m
[32m[DEBUG] [1500876235.517370273, 1500597636.677634330]: triangulation costs 0.068159[0m
[32m[DEBUG] [1500876235.517436384, 1500597636.677634330]: fix extinsic param[0m
[32m[DEBUG] [1500876235.518905163, 1500597636.677634330]: visual measurement count: 546[0m
[32m[DEBUG] [1500876235.518944170, 1500597636.677634330]: prepare for ceres: 1.477803[0m
[32m[DEBUG] [1500876235.556454550, 1500597636.717970814]: Iterations : 2[0m
[32m[DEBUG] [1500876235.556510774, 1500597636.717970814]: solver costs: 37.534092[0m
[32m[DEBUG] [1500876235.561520718, 1500597636.728032654]: pre marginalization 3.708139 ms[0m
[32m[DEBUG] [1500876235.585113696, 1500597636.748159986]: marginalization 23.511475 ms[0m
[32m[DEBUG] [1500876235.586332815, 1500597636.748159986]: whole marginalization costs: 29.564732[0m
[32m[DEBUG] [1500876235.586373390, 1500597636.748159986]: whole time for ceres: 68.906639[0m
[32m[DEBUG] [1500876235.587210114, 1500597636.748159986]: solver costs: 69.883425ms[0m
[32m[DEBUG] [1500876235.587434431, 1500597636.748159986]: marginalization costs: 0.171421ms[0m
[0m[ INFO] [1500876235.587517438, 1500597636.748159986]: position: 0.000440664  0.00050723  -0.0278666[0m
[32m[DEBUG] [1500876235.587566077, 1500597636.748159986]: orientation: -0.00137258    -0.02498   -0.048206[0m
[32m[DEBUG] [1500876235.587647196, 1500597636.748159986]: extirnsic tic:  -0.0287773   0.0556961 0.000498446[0m
[32m[DEBUG] [1500876235.587699675, 1500597636.748159986]: extrinsic ric: -89.5537 -1.72662   179.55[0m
[32m[DEBUG] [1500876235.587728571, 1500597636.748159986]: vo solver costs: 70.691925 ms[0m
[32m[DEBUG] [1500876235.587749659, 1500597636.748159986]: average of time 86.759073 ms[0m
[32m[DEBUG] [1500876235.587769114, 1500597636.748159986]: sum of path 0.039944[0m
[32m[DEBUG] [1500876235.639958934, 1500597636.798498503]: processing vision data with stamp 1500597636.703376

Laptop:

[32m[DEBUG] [1500876089.229592989, 1500597636.208239138]: new image coming ------------------------------------------[0m
[32m[DEBUG] [1500876089.229608588, 1500597636.208239138]: Adding feature points 138[0m
[32m[DEBUG] [1500876089.229620616, 1500597636.208239138]: input feature: 138[0m
[32m[DEBUG] [1500876089.229631783, 1500597636.208239138]: num of feature: 155[0m
[32m[DEBUG] [1500876089.229675986, 1500597636.208239138]: parallax_sum: 1.656623, parallax_num: 118[0m
[32m[DEBUG] [1500876089.229684131, 1500597636.208239138]: current parallax: 6.458021[0m
[32m[DEBUG] [1500876089.229691790, 1500597636.208239138]: this frame is--------------------reject[0m
[32m[DEBUG] [1500876089.229698299, 1500597636.208239138]: Non-keyframe[0m
[32m[DEBUG] [1500876089.229704149, 1500597636.208239138]: Solving 10[0m
[32m[DEBUG] [1500876089.229711940, 1500597636.208239138]: number of feature: 155[0m
[32m[DEBUG] [1500876089.229754109, 1500597636.208239138]: triangulation costs 0.001732[0m
[32m[DEBUG] [1500876089.229773620, 1500597636.208239138]: fix extinsic param[0m
[32m[DEBUG] [1500876089.230524258, 1500597636.208239138]: loop constraint num: 0[0m
[32m[DEBUG] [1500876089.230540255, 1500597636.208239138]: visual measurement count: 868[0m
[32m[DEBUG] [1500876089.230566668, 1500597636.208239138]: prepare for ceres: 0.781616[0m
iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  1.301863e+02    0.00e+00    2.46e+04   0.00e+00   0.00e+00  1.00e+04        0    1.31e-03    1.97e-03
   1  1.198102e+02    1.04e+01    1.72e+04   1.72e+02   1.31e+00  1.63e+04        1    3.16e-03    5.15e-03
   2  1.189159e+02    8.94e-01    3.02e+04   5.83e+02   8.10e-01  1.63e+04        1    3.03e-03    8.20e-03
   3  1.180976e+02    8.18e-01    1.92e+03   3.02e+02   1.21e+00  1.63e+04        1    3.32e-03    1.15e-02
   4  1.179893e+02    1.08e-01    1.51e+03   9.81e+01   1.32e+00  1.63e+04        1    3.25e-03    1.48e-02
   5  1.179837e+02    5.60e-03    2.44e+04   3.83e+02   1.65e-01  8.16e+03        1    3.19e-03    1.80e-02
   6  1.179242e+02    5.95e-02    5.40e+03   9.65e+02   1.05e+00  8.16e+03        1    3.19e-03    2.12e-02
   7  1.179772e+02   -5.30e-02    0.00e+00   7.49e+02  -3.79e+00  4.08e+03        1    1.74e-03    2.30e-02
   8  1.179772e+02   -5.30e-02    0.00e+00   7.49e+02  -3.79e+00  2.04e+03        0    3.48e-04    2.33e-02
[32m[DEBUG] [1500876089.254037106, 1500597636.228371302]: Iterations : 9[0m
[32m[DEBUG] [1500876089.254068109, 1500597636.228371302]: solver costs: 23.479367[0m
[32m[DEBUG] [1500876089.254211223, 1500597636.228371302]: whole marginalization costs: 0.000216[0m
[32m[DEBUG] [1500876089.254239060, 1500597636.228371302]: whole time for ceres: 24.455751[0m
[32m[DEBUG] [1500876089.254384110, 1500597636.228371302]: solver costs: 24.624158ms[0m
[32m[DEBUG] [1500876089.254620504, 1500597636.228371302]: marginalization costs: 0.223125ms[0m
[0m[ INFO] [1500876089.254657635, 1500597636.228371302]: position: -0.000371786    0.0108262  7.68421e-05[0m
[32m[DEBUG] [1500876089.254680762, 1500597636.228371302]: orientation: -0.0084849  0.0149779 0.00875228[0m
[32m[DEBUG] [1500876089.254707643, 1500597636.228371302]: extirnsic tic:  -0.0287773   0.0556961 0.000498446[0m
[32m[DEBUG] [1500876089.254722111, 1500597636.228371302]: extrinsic ric: -89.5537 -1.72662   179.55[0m
[32m[DEBUG] [1500876089.254729959, 1500597636.228371302]: vo solver costs: 25.093551 ms[0m
[32m[DEBUG] [1500876089.254736183, 1500597636.228371302]: average of time 80.135879 ms[0m
[32m[DEBUG] [1500876089.254743458, 1500597636.228371302]: sum of path 0.011036[0m
[32m[DEBUG] [1500876089.304051829, 1500597636.278650193]: processing vision data with stamp 1500597636.206887

For the above logs on TX2 I limited the max_cnt: 100 and disabled the loop closure loop_closure: 0, on laptop those are default 150 and loop closure enabled.

I tried to play with lowrering the "freq" and extending the "max_solver_time" but did not got good results. I notice the parallel execution of ceres is disabled could that help?

Maciej

HKUST-Aerial-Robotics / VINS-Mono

Running on odroid XU4 #1