NUbots / NUWebots

Environment, models, and communication for using the Webots simulation with NUbots
MIT License
7 stars 4 forks source link

Fixes an issue where our controllers cannot find `libCppController.so` #88

Closed ysims closed 2 years ago

ysims commented 2 years ago

There is an issue where Webots gives an error when running our controllers

/home/<username>/code/NUWebots/controllers/nugus_controller/nugus_controller: error while loading shared libraries: ../webots/lib/controller/libCppController.so: cannot open shared object file: No such file or directory
WARNING: 'nugus_controller' controller exited with status: 127.

See discussion below on the issue. Turns out CMake uses relative paths when your library is within the project, but this caused issues for us since we move the binary out of the build folder and into a subdirectory, breaking the relative path. This PR fixes it using @Bidski's suggestion, which is to put all of our stuff into a new subdirectory folder next to the webots folder.

TrentHouliston commented 2 years ago

Don't do this, cmake has features to handle this kind of thing properly rather than messing with environment variables

ysims commented 2 years ago

Don't do this, cmake has features to handle this kind of thing properly rather than messing with environment variables

What are these features?

LachlanCourt commented 2 years ago

@KipHamiltons :)

JosephusPaye commented 2 years ago

Seems like target_link_libraries() can be used for this. Something like:

target_link_libraries(libCppController /path/to/libCppController.so)

This https://stackoverflow.com/questions/8774593/cmake-link-to-external-library has some other options.

Basically found this by googling "cmake link a "so" file" and reading some documentation, so not sure how well it works in practice.

ysims commented 2 years ago

Seems like target_link_libraries() can be used for this. Something like:

target_link_libraries(libCppController /path/to/libCppController.so)

This https://stackoverflow.com/questions/8774593/cmake-link-to-external-library has some other options.

Basically found this by googling "cmake link a "so" file" and reading some documentation, so not sure how well it works in practice.

I've updated the PR now with something like this, and it seems to still work @TrentHouliston does it look better?

mrwerdo commented 2 years ago

Why did this break in the first place?

ysims commented 2 years ago

Why did this break in the first place?

I'm wondering if it was something to do with adding in the webots subtree.

Bidski commented 2 years ago

If you are linking against webots::webots target then you should be linking against libCppController.so. If that is failing then you should be looking into what is going wrong with find_package(webots) and not hacking this the way you are https://github.com/NUbots/NUWebots/blob/main/cmake/Modules/Findwebots.cmake#L26-L47

TrentHouliston commented 2 years ago

Sorry totally forgot that this PR existed and that I said something. Yes if linking is failing your linking script is failing somewhere, You should never link directly to a .so file. You should always find it and put it in a variable using a find package script, or a PackageConfig file

Specifically on line 47 of what @Bidski linked is relevant since that's making the webots::webots library have things attached to it. Those things should have been found by the find_library command that should be finding your cpp controller

ysims commented 2 years ago

Yeah I had a look a while back trying to properly fix it after some stuff @Bidski said in Slack but was having trouble finding the issue, it all seemed fine when I was printing stuff out. Last thing I was looking into was rpaths in case something went wrong there after moving WEBOTS_HOME inside the repo, but I couldn't see anything that would've been affected by it.

TrentHouliston commented 2 years ago

RPATHs can be the other thing that's going wrong. They store where relative to the file you can find libraries (when they're not in one of the standard locations) We use them in the NUbots code.

One of the important variables you should check on are

CMAKE_SKIP_BUILD_RPATH FALSE CMAKE_BUILD_WITH_INSTALL_RPATH

By default cmake doesn't set the rpath when you build, it only does it only when you install. And I doubt you're ever running cmake install so it might not set the rpath when you set that

Bidski commented 2 years ago

This would be what is probably causing us issues now. Pretty sure we would have been using this because things were too hard basket back before that virtual robocup. Having said that, setting LD_LIBRARY_PATH is a horrendous way to handle thing (see this for one possible reason why it is a horrendous thing to do).

As @TrentHouliston setting RPATHS is the correct way to fix this

KipHamiltons commented 2 years ago

Sooooo, I have looked at this for a bit. It seems like everything should be fine, but it's trying to link to a relative path, which ends up being incorrect, because I suppose the controllers aren't where they're expected to be when they're built. ldd shows the problem:

$ ldd controllers/nugus_controller/nugus_controller
        linux-vdso.so.1 (0x00007ffe56cf8000)
        ../webots/lib/controller/libCppController.so => not found # This is wrong!! 
        libprotobuf-lite.so.17 => /lib/x86_64-linux-gnu/libprotobuf-lite.so.17 (0x00007f4883d9e000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f4883d7b000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f4883b99000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f4883b7e000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f488398c000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f4883e67000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f488383b000)

I tried making it use absolute paths for linking by setting a policy, as suggested here. That doesn't work though. They also suggest setting IMPORTED_LOCATION (and IMPORTED_IMPLIB), which we do here (setting IMPORTED_IMPLIB there doesn't help either). I tried linking directly to webots::CppController, rather than transitively through webots::webots. That doesn't work - same issue with the runtime error. I find this so strange though because linking to webots::CppController should literally just link to the .so file, but it doesn't work, while linking directly to the .so file path does work. Definitely to do with the absolute vs. relative paths, but I can't work out how to fix it...

KipHamiltons commented 2 years ago

Also, as per your suggestions, I tried looking at RPATHs, without success. We set those RPATH variables here.

Bidski commented 2 years ago

@KipHamiltons nothing you tried worked because the issue has nothing to do with linking. What you are seeing is a runtime issue. If you can set the RPATH to be the same as the build time link path then you should be able to resolve this.

Some buried in our archives we used to do something with the build and install rpaths (I think back in the days of vagrant and virtualbox). You can try looking for that.

This seems to give some good advice https://gitlab.kitware.com/cmake/community/-/wikis/doc/cmake/RPATH-handling

This may also be useful https://cmake.org/cmake/help/latest/prop_tgt/BUILD_RPATH_USE_ORIGIN.html

KipHamiltons commented 2 years ago

To the best of my knowledge, we're following the advice of the first link - including basically using their example, or very close to - and that's not working. I tried setting that second link's option globally as a variable, as well as specifically for the target. no dice. Interestingly, as well, if you run objdump -x <path-to>/nugus_controller | grep RPATH, as suggested in the first link, you don't see anything. That suggests that even with all of these attempts at setting it, we might be failing?? not sure. Feeling pretty out of ideas (that aren't hacky) though.

TrentHouliston commented 2 years ago

If you're not seeing an rpath then clearly the rpath is not getting set (which is a problem). That said, I've had much better luck seeing what the rpath is with readelf than objdump readelf -d <binary> | grep rpath

ysims commented 2 years ago

If you're not seeing an rpath then clearly the rpath is not getting set (which is a problem). That said, I've had much better luck seeing what the rpath is with readelf than objdump readelf -d <binary> | grep rpath

Just tried with readelf and got no output with that either. At least that narrows down the problem a little.

TrentHouliston commented 2 years ago

If you run the readelf command one one of the NUbots binaries you'll hopefully see what is expected (or at least what the path of NUbots binaries are

ysims commented 2 years ago

@miikyla

ysims commented 2 years ago
$ ldd nugus_controller
    linux-vdso.so.1 (0x00007ffc63121000)
    libprotobuf-lite.so.23 => /lib/x86_64-linux-gnu/libprotobuf-lite.so.23 (0x00007f514b140000)
    ../webots/lib/controller/libCppController.so => not found
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f514af27000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f514af0d000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f514ace5000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f514b23d000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f514abff000)
$ readelf -d nugus_controller | grep 'R.*PATH'
 0x000000000000001d (RUNPATH)            Library runpath: [/home/ysi/code/NUWebots/webots/lib/controller]

There seems to be a runpath, but not an rpath. Even though the runpath is correct and I have no LD_LIBRARY_PATH set, it looks for libCppController.so in the wrong directory. Is this maybe indicating that the cmake variables are set wrong, or that runpath doesn't work similarly enough to rpath to work here and we still need to set the rpath somehow?

ysims commented 2 years ago

https://stackoverflow.com/questions/48846898/force-imported-target-to-use-absolute-path-to-so-file-even-when-it-is-located-i may be a lead

Bidski commented 2 years ago

I think the issue is that cmake is trying to be clever. CMake is recognising that the library files we are linking against are part of the projects folder structure (they are in a subfolder of the cmake project folder), so it is making some paths relative for some stupid reason.

So here is a fix, change the folder structure of this repo to be

NUWebot/
  - webots       (the webots subtree)
  - nuwebots   (all of our code)

then build as normal (pretty sure all of your other setup and commands will work as is). This I have tested and seems to work fine.

An alternative that I have not tested is to leave the folder structure as is and to wrap the webots subtree up in an ExternalProject. @KipHamiltons or @CMurtagh-LGTM might be interested in attempting such a thing, no idea if this will work or not

ysims commented 2 years ago

Think this is good now. Since there's so many files changed, I made comments on the changes that aren't just moving files in the new nuwebots directory.