GHA workflow R&D - Githubissues

nilsnolde commented 1 year ago

Having CI workflows running for PR & e.g. distribution for Linux, Mac, Win poses some challenges (esp for the latter two).. To test the code on every push or package it, we'd need to:

build all OSRM dependencies on Mac & Win (Linux is easier here)
build all OSRM binaries so we can bind to them

Ideally we only (re-)build dependencies when they actually need to be re-built (i.e. upstream needs new minimum versions). And ideally we only (re-)build OSRM binaries when we actually upgrade the OSRM version.

IMHO we should investigate if we can't make this more optimal and have workflows which build dependencies & OSRM when we need it and push them to some remote storage where our binding workflows can pull from. I didn't really look into this too much, but first thoughts:

Github has artifacts which can be pushed & pulled. Since a few months they're also exposed via Github's HTTP API, so pushing/pulling to/from Github artifacts could be scriptable. Main detractor might be that artifacts might have a pretty tight expiry.. Ideally it'd not expire for at least a year.
If nothing else works within the Github universe, we could do what OSRM's Windows workflow did until recently: push the compressed built binaries to some server (I'm happy to give one of ours), where the binding workflow can pull from.

This would be also interesting for other repos, e.g. https://github.com/VROOM-Project/pyvroom, where we'd like to be able to compile VROOM with libosrm support before binding VROOM in Python.

nilsnolde commented 1 year ago

Then I think we can move on to seeing OSX through @whytro @SiarheiFedartsou , WDYT? I'd suggest to come up with a GHA workflow that per button push (see on: workflow_dispatch) generates OSRM binaries and all needed dependencies, and either have that pushed to Github artifacts, or or some server, from where a future workflow can download them to build the bindings.

For testing, you can start with some random data and see which flow works (better).

If you need some clarification/discussion on this, please let me know and we can jump on a video call @whytro .

SiarheiFedartsou commented 1 year ago

Then I think we can move on to seeing OSX through @whytro @SiarheiFedartsou , WDYT? I'd suggest to come up with a GHA workflow that per button push (see on: workflow_dispatch) generates OSRM binaries and all needed dependencies, and either have that pushed to Github artifacts, or or some server, from where a future workflow can download them to build the bindings.

For testing, you can start with some random data and see which flow works (better).

If you need some clarification/discussion on this, please let me know and we can jump on a video call @whytro .

Sounds good.

nilsnolde commented 1 year ago

OSX support landed in #13

whytro commented 1 year ago

Dropping an update on the Windows side of things. Windows seems a bit more complex to manage in this regard, especially regarding the exporting of dependencies (ie. boost). Since osrm-backend doesn't use a formal conan profile, I can't use the conan export-pkg command, so I'll have to look into simply copying over the .conan folder to see if I can get that to work.

nilsnolde commented 1 year ago

We agreed offline to keep using conan. Likely it's just the binding's build command that's a bit off.

whytro commented 1 year ago

There's some symlinking going on as well, which seems to mess with the target-side, so I do need to investigate more regarding the packaging aspects of things (ie. deference flag on tar), and if it affects the outcome.

whytro commented 1 year ago

A debug build on osrm-backend with conan enabled seems to fail on the target machine (runs out of space apparently), while a vcpkg run on osrm-backend with the vcpkg toolchain enabled throws the error: Evaluation file to be written multiple times with different content., and fails to configure.

whytro commented 1 year ago

Currently, I'm not able to get the OSRM library to be detected upon install, resulting in the Windows build failing, even though it seems like it should (ie. it's a path listed in the find debug log). I've tried pointing cmake directly at the OSRM library and headers, but it still fails to configure, and complains about missing headers.

Placing osrm-backend as a submodule yields a different result where it'll run conan, and then immediately fail to find the conan packages. osrm-backend still seems to not build with vcpkg for me as well.

whytro commented 1 year ago

The test_index tests seem to pass (some of those failures were caused by not running osrm-datastore prior), not sure about the access violation though, which is weird.

nilsnolde commented 1 year ago

Hm, does any route/table/match test work? Do we know if it's the bindings or the OSRM code causing the segfault?

whytro commented 1 year ago

Hm, does any route/table/match test work? Do we know if it's the bindings or the OSRM code causing the segfault?

Some of the other tests work, but table, match and tile tests throw an access violation, and the other tests will throw a "could not find matching segment" error for many of the tests that run without throwing an access violation.

The cause doesn't seem to be in the bindings, since I traced it back to an osrm-backend function here where it seems to stop working. This seems to leave osrm-backend (ie. bad install or something), nanobind (somehow), or a bad test generation as the main suspects, and I'm hoping it's just badly generated test data.

I'll try to rule out improper generation of the test data, and then run the NodeJS test suite to get a better understanding of where the problem really lies.

nilsnolde commented 1 year ago

Ah right, I'm not sure linux generated graphs would work with Windows. Or did you do what OSRM's windows-build.bat is doing and generate the graph on Windows? I'm :crossed_fingers: that it's that.

whytro commented 1 year ago

Ah right, I'm not sure linux generated graphs would work with Windows. Or did you do what OSRM's windows-build.bat is doing and generate the graph on Windows? I'm 🤞 that it's that.

Yes, above errors were with windows-build.bat, since osrm-datastore will complain if using linux generated graphs and refuse to run.

whytro commented 1 year ago

I've confirmed that the nodejs tests seem to be running fine on the test system, while the Python bindings still seem to throw an access violation - so it seems to be something wrong with the bindings side.

I've noticed that it also occurs when use_shared_memory is set to False - while working fine if set to True.

whytro commented 1 year ago

I've figured out the issue regarding the Windows tests failing. I'll draft up a PR to implement Windows builds after I unravel some jank that was required to get it to build in the first place.

nilsnolde commented 1 year ago

Wow congrats @whytro 🥳. That’s the icing on your project cake:)

gis-ops / py-osrm

GHA workflow R&D #11