Desbordante / desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
GNU Affero General Public License v3.0
382 stars 70 forks source link

Add MacOS (Apple silicon) support and dependencies installation guide (Ubuntu + MacOS) to README #433

Closed Vdaleke closed 2 months ago

Vdaleke commented 3 months ago

Change platform-specific flag in build.sh script, set GCC as default compiler in CMakeLists.txt. Refactor some system-specific code to C++ standard. Add dependencies installation guide to README.md.

Vdaleke commented 3 months ago

I've never built anything on a MacOS, so I don't know how to do it correctly in general. I'm wondering if we can somehow simplify it? I looked at several build guides for MacOS and I did not see similar commands with installing symbolic links. Is it possible to avoid this?

Yeah, I found a way to avoid symbolic links and installation guide is shorter now.

Regarding the boost, I'm wondering if there is any way to avoid installing the boost manually? Is it known what causes the errors associated with installing boost from homebrew. How difficult is it to fix them and is it generally possible?

You can look at the boost installation formula in brew. Lines 37-38 set default darwin (Clang) toolset on MacOS by default. Possible way to fix it make a PR or an Issue in their repo.

It seems to me that the simpler the build instructions can be made for the user, the better it would be. And perhaps it makes sense to look at how this is done in other C++ projects that support build for MacOS. If you can find a good example for us to build on, that would be very helpful.

Unfortunately, I couldn't find C++ project on MacOS with Boost built with GCC, I think it's a very rare specification. There are only few questions on StackOverflow about such building without correct answers. And only 2 projects on GitHub with topics: macos, gcc, boost, which don't contain such cases.

We also probably want to add build for MacOS to the CI.

I think, I will do it in next PR

vs9h commented 3 months ago

You can look at the boost installation formula in brew. Lines 37-38 set default darwin (Clang) toolset on MacOS by default. Possible way to fix it make a PR or an Issue in their repo.

It seems, that we can build boost with gcc using the following command: brew install boost --cc=gcc-X. I found this on the stackoverflow and in some project, using boost with gcc on MacOS.

I think, I will do it in next PR

Yes, I think this is quite reasonable considering that it will take a lot of time. By the way, have you checked whether the current tests work correctly on a MacOS (without large datasets, in debug/release modes)?

vs9h commented 3 months ago

It’s also interesting that we will have info about dependencies installation for MacOS, but there will be no guide for Ubuntu (namely, for Ubuntu). In the future, for consistency, it’s worth adding for Linux. @chernyshev, maybe it’s worth giving this task for someone in the future?

chernishev commented 3 months ago

It’s also interesting that we will have info about dependencies installation for MacOS, but there will be no guide for Ubuntu (namely, for Ubuntu). In the future, for consistency, it’s worth adding for Linux. @chernyshev, maybe it’s worth giving this task for someone in the future?

Yes, I will keep it in mind. Btw, Matvey told me that he will put MacOS build in CI later, so may be it is a good idea to do this with this work.

Vdaleke commented 3 months ago

It seems, that we can build boost with gcc using the following command: brew install boost --cc=gcc-X. I found this on the stackoverflow and in some project, using boost with gcc on MacOS.

I have tried this way firstly, but this didn't work for me. For now it's unsupported configuration, my logs:

➜  brew install boost --cc=gcc-14
Warning: You passed `--cc=gcc-14`.
It is expected behaviour that some formulae will fail to build in this unsupported configuration.
It is expected behaviour that Homebrew will be buggy and slow.
Do not create any issues about this on Homebrew's GitHub repositories.
Do not create any issues even if you think this message is unrelated.
Any opened issues will be immediately closed without response.
Do not ask for help from Homebrew or its maintainers on social media.
You may ask for help in Homebrew's discussions but are unlikely to receive a response.
Try to figure out the problem yourself and submit a fix as a pull request.
We will review it but may or may not accept it.

==> Fetching dependencies for boost: xz
==> Fetching xz
==> Downloading https://ghcr.io/v2/homebrew/core/xz/manifests/5.6.2
################################################################################################################################################### 100.0%
==> Downloading https://ghcr.io/v2/homebrew/core/xz/blobs/sha256:5ec389ac6a0b190914be00c62d2de0a18265c39d1243420d08841afea16ff7f9
################################################################################################################################################### 100.0%
==> Fetching boost
==> Downloading https://raw.githubusercontent.com/Homebrew/homebrew-core/1a64754dc4ab5c9ca81283be706527381b506a45/Formula/b/boost.rb
################################################################################################################################################### 100.0%
==> Downloading https://github.com/boostorg/boost/releases/download/boost-1.85.0/boost-1.85.0-b2-nodocs.tar.xz
Already downloaded: /Users/matvey_smirnov/Library/Caches/Homebrew/downloads/b89b260e2b1f089e89ffc282d37ada0e1e6b1e969eea343f304bb8631cb58a8d--boost-1.85.0-b2-nodocs.tar.xz
==> Installing dependencies for boost: xz
==> Installing boost dependency: xz
==> Downloading https://ghcr.io/v2/homebrew/core/xz/manifests/5.6.2
Already downloaded: /Users/matvey_smirnov/Library/Caches/Homebrew/downloads/0f02a3a463ce4e72f92871751d9ba7b872ca8090348074d46ffb523fd67e1c7b--xz-5.6.2.bottle_manifest.json
==> Pouring xz--5.6.2.arm64_sonoma.bottle.tar.gz
🍺  /opt/homebrew/Cellar/xz/5.6.2: 96 files, 1.9MB
==> Installing boost
==> ./bootstrap.sh --prefix=/opt/homebrew/Cellar/boost/1.85.0 --libdir=/opt/homebrew/Cellar/boost/1.85.0/lib --with-icu=/opt/homebrew/opt/icu4c --without-
==> ./b2 headers
Last 15 lines from /Users/matvey_smirnov/Library/Logs/Homebrew/boost/02.b2:
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

'
/private/tmp/boost-20240722-96026-g8ccw6/boost-1.85.0/tools/build/src/tools/clang-linux.jam:112: in get-full-version from module clang-linux
/private/tmp/boost-20240722-96026-g8ccw6/boost-1.85.0/tools/build/src/tools/clang-linux.jam:117: in clang-linux.get-short-version from module clang-linux
/private/tmp/boost-20240722-96026-g8ccw6/boost-1.85.0/tools/build/src/tools/clang-darwin.jam:71: in get-short-version from module clang-darwin
/private/tmp/boost-20240722-96026-g8ccw6/boost-1.85.0/tools/build/src/tools/clang-darwin.jam:48: in clang-darwin.init from module clang-darwin
/private/tmp/boost-20240722-96026-g8ccw6/boost-1.85.0/tools/build/src/build/toolset.jam:42: in toolset.using from module toolset
/private/tmp/boost-20240722-96026-g8ccw6/boost-1.85.0/tools/build/src/tools/clang.jam:28: in clang.init from module clang
/private/tmp/boost-20240722-96026-g8ccw6/boost-1.85.0/tools/build/src/build/toolset.jam:42: in toolset.using from module toolset
/private/tmp/boost-20240722-96026-g8ccw6/boost-1.85.0/tools/build/src/build/project.jam:1283: in using from module project-rules
project-config.jam:12: in modules.load from module project-config
/private/tmp/boost-20240722-96026-g8ccw6/boost-1.85.0/tools/build/src/build-system.jam:255: in load-config from module build-system
/private/tmp/boost-20240722-96026-g8ccw6/boost-1.85.0/tools/build/src/build-system.jam:486: in load-configuration-files from module build-system
/private/tmp/boost-20240722-96026-g8ccw6/boost-1.85.0/tools/build/src/build-system.jam:607: in module scope from module build-system

Do not report this issue to Homebrew/brew or Homebrew/homebrew-core!

Do not report this issue: you are running in an unsupported configuration.

By the way, have you checked whether the current tests work correctly on a MacOS (without large datasets, in debug/release modes)?

release

[==========] 472 tests from 62 test suites ran. (108625 ms total)
[  PASSED  ] 472 tests.

debug

[==========] 472 tests from 62 test suites ran. (1197619 ms total)
[  PASSED  ] 472 tests.
Vdaleke commented 3 months ago

It seems, that we can build boost with gcc using the following command: brew install boost --cc=gcc-X. I found this on the stackoverflow and in some project, using boost with gcc on MacOS.

@vs9h I was also asked to clarify that the solution on StackOverflow and in this project was presented 9 years ago, and the formula in homebrew has changed a lot during this time

vs9h commented 3 months ago

I was also asked to clarify that the solution on StackOverflow and in this project was presented 9 years ago, and the formula in homebrew has changed a lot during this time

Yes, thanks for the answer. Indeed, I came across outdated information.

It seems that there is really no way to build a boost with gcc using homebrew. Moreover, there is probably a reason why they decided to remove this opportunity from homebrew, i.e. it was probably done intentionally by people contributing to homebrew.

I looked at discussions regarding this on Homebrew. And I came across a discussion on a similar topic. From this answer it becomes obvious that trying to make an issue or PR in the repository is pointless. Also, we already know that the --cc flag will not help us. But more interesting is the beginning, which says that if you need a different version of the boost formula, then create a custom tap and use it. If I understand correctly, then we are doing something like a fork and using it, and in the guide it will look like brew install *username*/custom/boost. It sounds interesting, but since I don’t have enough experience, I can’t say whether this should be used. And also, it seems to me that I managed to find a more elegant solution.

The most interesting is this answer, which he refers to in the post above, namely the first sentence:

Homebrew boost is compiled with Clang, but I'm a bit surprised that that makes it useable only with Clang...

In fact, the problem is that when compiled by clang, the standard libc++ library is used, and when compiled by gcc, libstdc++ is used. As I understand it, if we build Desbordante using -stdlib=libc++ and install the library on MacOS simply from Homebrew, then Desbordante will probably work correctly. This solution is my favorite at the moment. This will save us from such a complication of the installation guide.

To summarize, I would like to suggest changing the standard library only for MacOS; to do this, you need to add corresponding logic for checking the operating system in the CMakeLists.txt (I don’t know how to do this correctly, but there should be enough examples on the Internet). And you also need to check the work of the tests after changing the standard library.

Vdaleke commented 3 months ago

In fact, the problem is that when compiled by clang, the standard libc++ library is used, and when compiled by gcc, libstdc++ is used. As I understand it, if we build Desbordante using -stdlib=libc++ and install the library on MacOS simply from Homebrew, then Desbordante will probably work correctly. This solution is my favorite at the moment. This will save us from such a complication of the installation guide.

It would be nice if we can use 'libc++', but Desbordante not following C++ standard at the moment. It has a lot of gcc-specific code and, as I seem, it's not easy to remove it. Logs while compiling Clang:

desbordante-core/src/core/algorithms/od/fastod/model/attribute_set.h:96:24: error: no member named '_Find_first' in 'std::bitset<64>'
        return bitset_._Find_first();
               ~~~~~~~ ^
/Users/matvey_smirnov/Yandex.Disk.localized/Studies/HSE/SRW-2nd-Year/desbordante-core/src/core/algorithms/od/fastod/model/attribute_set.h:100:24: error: no member named '_Find_next' in 'std::bitset<64>'
        return bitset_._Find_next(pos);
               ~~~~~~~ ^

desbordante-core/src/core/algorithms/od/fastod/util/timer.h:7:32: error: no member named '_V2' in namespace 'std::chrono'
using TimePoint = std::chrono::_V2::high_resolution_clock::time_point;
                  ~~~~~~~~~~~~~^
/Users/matvey_smirnov/Yandex.Disk.localized/Studies/HSE/SRW-2nd-Year/desbordante-core/src/core/algorithms/od/fastod/util/timer.h:12:5: error: unknown type name 'TimePoint'
    TimePoint start_time_;
    ^
/Users/matvey_smirnov/Yandex.Disk.localized/Studies/HSE/SRW-2nd-Year/desbordante-core/src/core/algorithms/od/fastod/util/timer.h:13:5: error: unknown type name 'TimePoint'
    TimePoint end_time_;
    ^
5 errors generated.

I don't know where we could find fast cpp-standard replacement for these structures. And these should be only the first compilation errors; it is not known what will be found next in tests or other parts of code. That's the reason why I used GCC and this guide exists. If we could use 'libc++', we also could use Clang and there were no problems to compile it on MacOS. I told @chernishev about it 1-2 weeks ago and we added it to future tasks.

But more interesting is the beginning, which says that if you need a different version of the boost formula, then create a custom tap and use it. If I understand correctly, then we are doing something like a fork and using it, and in the guide it will look like brew install username/custom/boost. It sounds interesting, but since I don’t have enough experience, I can’t say whether this should be used.

Wow, I've heard before that it's possible to create custom formulae, but I've thought that this could be done locally only and that is not easier then building boost from sources. But now I think, that we can create public fork and use it how you described to asking users install boost with this formulae and it would be a good solution.

Unfortunately, it seems to me that this can take a lot of time, Ruby is used there, you will need to add a dependency to the GCC installation earlier, or add some flags where you will need to specify the compiler, this will require knowledge of the homebrew structure from the inside. I'm not ready to spend time on this right now, because I need to finish other functionality and I'm starting an internship soon. I think we can stay with the current guide, in my opinion, it's not complicated enough, and we can simplify it later.

vs9h commented 3 months ago

Hmm, I thought that we could use different standard libraries with gcc, but it turns out that only clang allows us to do this (to be more precise, in the case of gcc you need to use some crutches for this). And I thought that we could simply replace the standard library without updates. But we do use several gcc extensions, which are implemented in the standard library. So yes, fair enough, let's leave this for later. The current version is quite good.

Regarding gcc extensions, their use in the Desbordante is minimal: 1) std::chrono::_V2::high_resolution_clock::time_point - most likely this will need to be replaced with std::chrono::time_point<std::chrono::steady_clock> 2) use of the _Find_first and _Find_next std::bitset<> extensions - Here we need to think about how to correctly (in performance meaning) iterate over bitset. We will probably need to use __builtin_ctz (or std::countr_zero from C++20). In general, this seems to be the only thing that will require some attention.

And I don’t think that we use any of the other GCC extensions.

Vdaleke commented 3 months ago

Regarding gcc extensions, their use in the Desbordante is minimal:

Are you sure that this is all of them? I just gave examples of those that are issued first.

vs9h commented 3 months ago

Regarding gcc extensions, their use in the Desbordante is minimal:

Are you sure that this is all of them? I just gave examples of those that are issued first.

I tried to build Desbordante using clang by removing the use of the extensions I wrote about above. And I didn’t find any more errors. The error was only at the linking stage (I think this is due to the fact that my boost uses a different standard library)

vs9h commented 3 months ago

@Vdaleke, can you squash the last two commits? We avoid commits that fix something from previous commits within the same pull request

Vdaleke commented 3 months ago

@vs9h I added Ubuntu dependencies installation guide too. Waiting for review