Farama-Foundation / Arcade-Learning-Environment

The Arcade Learning Environment (ALE) -- a platform for AI research.
GNU General Public License v2.0
2.1k stars 416 forks source link

incorrect cmake configuration / include layout post installation #532

Open maichmueller opened 1 month ago

maichmueller commented 1 month ago

I have been working on packaging this project for later use in c++ with conan, the package manager. During this process I believe to have come across an inconsistency in the c++ layout of include headers:

In the source tree everything lives directly in src, i.e header files start directly at e.g. src/ale_interface.hpp.

src
├── ale_interface.hpp
├── CMakeLists.txt
├── common
│   ├── ...
├── emucore
│   ├── ...
│   ├── OSystem.hxx
│   ├── ...
│   ├── Sound.hxx
│   ├── ...

The include directories are also mapped to src for the c++ targets in src/CMakeLists.txt:

include_directories(BEFORE ${CMAKE_CURRENT_SOURCE_DIR} ${CMAKE_CURRENT_BINARY_DIR})

However, for installation the layout is changed to a project-named one (which is correct to do) {install_prefix}/include/ale:

  # Install Header files
  install(DIRECTORY ${CMAKE_CURRENT_LIST_DIR}/
          DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME}
          FILES_MATCHING
          REGEX "\.h((pp)?|(xx?))$"
          REGEX "os_dependent\/.*(Win32)\.(h|h(xx)?|h(pp)?|c(xx)?|c(pp)?)$" EXCLUDE)

The problem that this creates is the following now:

In ale_interface.hpp:

 include `emucore/OSystem.hxx`

and in emucore/OSystem.hxx:

#include "emucore/Sound.hxx"

This works in the src layout. #include emucore/OSystem.hxx in ale_interface will always work because emucore is found via a relative search from the interface file.

But #include "emucore/Sound.hxx" cannot resolve emucore from a relative search, since the including file (OSystem.hxx) is inside the included folder (emucore), so it will need to be found relative to the include-directories (src at build-time). But at install-time these are now changed to ale.

include
└── ale
    ├── ale_interface.hpp
    ├── common
    │   ├── ...
    ├── emucore
    │   ├── ...
    │   ├── OSystem.hxx
    │   ├── ...
    │   ├── Sound.hxx

I see 2 ways to fix this:

  1. Change the src layout to mimick the install layout, i.e. move all files into src/ale and prepend all includes for project files with ale/, e.g. `#include "ale/emucore/OSystem.hxx".
  2. Ensure all relative includes in all files are not relative to the top-level include, but the file itself.

I believe option 1 is less work and can be achieved with the help of some regex search and an IDE. Option 2 is more work in case there a many files that have made this mistake.

I could provide a PR if you agree with this. Otherwise, let me know if I got something wrong :)

pseudo-rnd-thoughts commented 1 month ago

Hey, I'm primarily a python guy so kinda understand what you are talking about. Somehow I thought that number 2 would be simplier but you probably know better than me.

If you make a PR, I can certainly review it and see if the original author can as well.

maichmueller commented 1 month ago

Hey @pseudo-rnd-thoughts , option 2 would be simpler if there weren't so many files. With around 180 header files I am not super stoked to go through each individually to check if the relative path matches. Option 1 is simpler to me, since a simple regex allows me to insert "ale/" in front of every include to bullet-proof the include directives (unless the include styles of relative-to-file and relative-to-include-dir were both used in the project, then option 2 would be less problematic I presume).

I can understand that changing the project layout is a hassle, so I haven't immediately gone to work on it. As long as there aren't many other developments happening on the repo (in branches/forks), I do think it would be the overall path of least resistance. I will give it a try and report back.

maichmueller commented 1 month ago

I have made the necessary changes for option 1 on my fork here:

https://github.com/maichmueller/Arcade-Learning-Environment

The CI runs as well except for the wheel RAM tests in python 3.9+ (3.8 works though on all platforms. I am a little puzzled by this error):

https://github.com/maichmueller/Arcade-Learning-Environment/actions/runs/9563036581

pseudo-rnd-thoughts commented 1 month ago

Thanks for doing that @maichmueller, could you make it a PR as it is bit difficult to see all the changes as there are several commits. On RAM tests, that is very weird, I have no idea what is going on. I would have expected a C++ error for the function doesn't exist or links incorrectly but I don't see that

maichmueller commented 1 month ago

I created a draft PR @pseudo-rnd-thoughts , let me know what you can identify there.

I have tried multiple things to fix the CI errors. My observations:

  1. I cannot recreate any of the problems occurring on the CI on a local ubuntu 22.04 setup (neither with py3.8, nor 3.10). Everything builds, installs, tests, and runs fine there.
  2. There are 2 different CI errors I ran into: i) The getRAM python bound method does not provide the correct values (the error on the main branch with the PR). This is the only test that does not run correctly in this case. ii) The installed 'ale_py' package cannot import '_ale_py', the extension module, despite this module being there in the installed package (for reference see e.g. this run on a branch in which I am trying to fix the CI problem. The 'Check Installation' step clearly shows the extension placed in the installed ale_py package, but it cannot be loaded in the next step...)