agentm / project-m36

Project: M36 Relational Algebra Engine
The Unlicense
876 stars 47 forks source link

implement CI builds to release binaries #258

Closed agentm closed 3 years ago

agentm commented 4 years ago

I think that requiring users to compile Project:M36 on their own is a significant barrier to entry to trying this project. A pre-built binary, especially for Windows, would likely draw in more users. To that end, it would be great if the CI builders running on the master branch generated binaries and created bundled releases.

This would be a high-impact project improvement but requires YAML and devops experience instead of Haskell knowledge.

The Tweag Fellowship may even be willing to sponsor such work, in case anyone wishes to apply.

More specifically, the first-time TutorialD user experience could be improved by:

hughjfchen commented 4 years ago

@agentm , I have a CI/CD framework which I built with shell script from scratch. It should be able to build M36, but unfortunately, it doesn't support windows( it does support Linux and MacOS X), I'll try and see what happens. And regarding to the release, I guess a docker image maybe more friendly than a binary? With a docker image, user can just pull it from the repos through internet and boot it to play around.

agentm commented 4 years ago

@hughjfchen, that sounds great. It would be great for users to be able try out a docker image. Does your script work with travis CI?

hughjfchen commented 4 years ago

@agentm , there're just shell scripts which well integrate into your repository and can be used within the travis.yml, so it should be work with travis CI.

hughjfchen commented 4 years ago

@agentm , I'd already built all executables for project-m36 with my CI/CD framework but failed on running the test suite. The error message said: `running tests Running 20 test suites... Test suite test-relation-import-csv: RUNNING... Test suite test-relation-import-csv: PASS Test suite logged to: dist/test/project-m36-0.7-test-relation-import-csv.log Test suite test-relation: RUNNING... Test suite test-relation: PASS Test suite logged to: dist/test/project-m36-0.7-test-relation.log Test suite test-transactiongraph-persist: RUNNING... Cases: 3 Tried: 2 Errors: 0 Failures: 0Failed to load scripting engine- scripting disabled: ScriptSessionLoadError : cannot satisfy -trust project-m36 (use -v for more information)

Failure in: 2

test/TransactionGraph/Persist.hs:104 ScriptError ScriptCompilationDisabledError Cases: 3 Tried: 3 Errors: 0 Failures: 1`

agentm commented 4 years ago

@hughjfchen, the test suite relies on the project-m36 dynamic library, so you need to run cabal install first.

hughjfchen commented 4 years ago

@agentm , I use nix to build the docker. Anyway, I modified the CI script to run cabal new-test within a nix-shell and the tests can continue running and the test-script test suit got PASSED, however, there is still ONE test case failed, following is the error message:

Running 1 test suites... Test suite test-server: RUNNING... The filesystem does not support journaling so writes may not be crash-safe. Use --disable-fscheck to > disable this fatal error. test-server: thread blocked indefinitely in an MVar operation Test suite test-server: FAIL Test suite logged to: /home/chenjf/projects/project-m36/dist-newstyle/build/x86_64-linux/ghc-8.6.5/project-m36-0.7/t/test-server/test/project-m36-0.7-test-server.log 0 of 1 test suites (0 of 1 test cases) passed.

agentm commented 4 years ago

That's a test that requires a journaling filesystem. Hm- there doesn't seem to be a way to disable it from within the test.

I can add an environment variable to disable it- would you be able to pass it down through nix?

hughjfchen commented 4 years ago

@agentm , Yes, I can set an environment variable within a nix-shell before running the test, if you think that can help you decide if the specific test case(s) should be disable. Just let me know what environment variable(s). Another question, I still got the Failed to load the script engine error when I try to run the binary tutd as following:

/nix/store/bpynflzknw42mbkf7zjibx5piwdcvihn-project-m36-0.7/bin/tutd Failed to load scripting engine- scripting disabled: ScriptSessionLoadError : cannot satisfy -trust project-m36 (use -v for more information) Project:M36 TutorialD Interpreter 0.7 Type ":help" for more information. A full tutorial is available at: https://github.com/agentm/project-m36/blob/master/docs/tutd_tutorial.markdown TutorialD (master/main):

Does that means the binary program tutd needs a GHC/cabal installation environment to run? If yes, package such env into a docker image may lead to a very big size image.

hughjfchen commented 4 years ago

Regarding to disable some test cases based on some settings, will it be better to split the test cases into a separate test suite and decide if to build and run it based on some cabal flags? It could be disabled by default, only build and run if the cabal flags set to true.

agentm commented 4 years ago

I think that nix tries to run tests by default when building but there is flag to disable them.

The last time I tried to build project-m36 with nix with @3noch, I think we hit the same roadblock. Specifically, the Project:M36 binaries want to see the Project:M36 dynamically-linked library in the haskell packages. (This is required for using Haskell as a backend scripting language, so it links against GHC as well.) However, the backend scripting is optional, so once you disable the tests, nothing should be requesting it.

hughjfchen commented 4 years ago

@agentm , I'm not sure I completely understand you but I use cabal2nix to generate nix drv expression for the haskell packages based on their cabal file and you can pass a parameter to cabal2nix to disable test in the generated nix file. You can also pass cabal flags to the cabal2nix command to generate a nix file with the flags.

agentm commented 4 years ago

Yea, I'm thinking that it would be a significant loss to our CI system if it can't run the tests. Building with nix is certainly worthwhile, but trading a test runner to get binaries out seems like a mistake. Sorry I didn't bring it up sooner- we did hit this barrier with nix before.

hughjfchen commented 4 years ago

@agentm , We don't have to lose any tests now. We just run cabal new-test within a nix-shell and all tests can run without any problems. We just have ONLY ONE test case test-server failed with the above error message which we need to address how to handle. Once it got solved and all tests passed, we can use nix-build to build the docker image and push to some docker registry for consuming.

And for the specific failed test-server, I'm wondering why it would fail because the file system I'm using is an ext4 and it should be a journaling file system. Do you have any tips how to check if the file system is journaling? And we build m36 on that machine, however we may not run m36 on that machine. Even the building machine passed the test, doesn't mean the running machine will satisfy the requirement.

YuMingLiao commented 4 years ago

hi, @hughjfchen . I just found some command line about it. Hope this helps.

hughjfchen commented 4 years ago

I do check my ext4 file system and found it should be journaling:

Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit _bg dir_nlink extra_isize

agentm commented 4 years ago

Regarding tests, does that include the Haskell scripting tests?

Regarding the journaling filesystem, that is odd indeed. Here's the code which runs statfs and checks if the filesystem is in a list of filesystems which support journaling. I can disable that check in the test, but it would be good to know if there some better code to determine if a filesystem supports journaling under Linux (without requiring root access).

hughjfchen commented 4 years ago

@agentm , Regarding tests, yes, they all run successfully and passed within a nix-shell, including the Haskell scripting tests. Regarding the specific journaling filesystem test case, as @YuMingLiao points to, there're some commands to check but they all need root or sudo. I googled but found no way to do that without root or sudo.

agentm commented 4 years ago

I disabled in the fs check for tests in master, so, hopefully, the tests should pass now in your environment. Let me know if you hit another issue.

hughjfchen commented 4 years ago

@agentm , I've merged your update into my fork repository and all tests have been running and passed within the nix-shell and I'd already built a docker image which includes project-m36-server, project-m36-websocket-server and tutd, but the two server binaries will still print out the following message:

Failed to load scripting engine- scripting disabled: ScriptSessionLoadError : cannot satisfy -trust project-m36 (use -v for more information)

I check the binary command line and see there is a option --ghc-pkg-dir, do I need to install a ghc and its packages and include them into the docker image to run the binaries?

agentm commented 4 years ago

That's great news!

The server-side Haskell scripting is optional, but it would be nice to have it enabled. However, I had trouble enabling it with nix before. I think your best bet is to dump the nix environment to figure out where the ghc-pkg directory is. It should be the destination for the project-m36 dynamic library as a haskell package; the pre-built project-m36 package is also required to get the Haskell scripting to work.

hughjfchen commented 4 years ago

@agentm , I guess we need to trade off. I can think of three options:

  1. pack the project-m36 dynamic lib and its dependencies into the docker image and set the ghc-pkg-dir command line option to point to the appropriate directory within the image Pros: users don't need to worry about the dependencies and the program works out of the box. Cons: the docker image is TOO BIG, I've roughly done some test, the final image size may over 4G and even with compressed tarball, it's still around 800Mb

2.build the image without the dependencies Pros: the image size is around 60Mb, perfect for a docker image Cons: the end user has to deal with the dependencies if he/she wants to use the scripting engine

3.Disable the scripting feature Honestly, I don't know what impact will be for the user's experience without this feature.

What do you think we should take for next step?

agentm commented 4 years ago

Thanks for the detailed summary!

The Haskell scripting enables users to write and load server-side functions at runtime. Otherwise, such functions must be pre-compiled into the server.

For a first pass, I think it's fine to disable that feature for now.

Have you already tried your patch with travis?

What's your recommendation for how to best integrate the docker generation with continuous delivery?

hughjfchen commented 4 years ago

@agentm ,

Have you already tried your patch with travis?

Yes,I've already tested with travis-ci and it passed all the tests and built the docker images successfully.

What's your recommendation for how to best integrate the docker generation with continuous delivery?

I'm still thinking about this. My idea is to register an account with the docker image hub and after having successfully built the docker images, push them to the hub under the account so that users can pull and play around with them.

hughjfchen commented 4 years ago

@agentm , I noticed that even statically linked with haskell dependent libraries, the final size of the docker image still around 400Mb. I've checked the cabal file and found that project-m36 depends on ghc-boot, ghci etc, which make nix calculates its runtime dependencies of a large set of packages, including ghc and gcc, do you think there's a better way to reduce the size of the final docker image?

agentm commented 4 years ago

Yes, if the Haskell scripting is disabled, then I can make those dependencies optional and hide them behind a cabal flag.

agentm commented 4 years ago

Ok, I made linking against GHC optional in 6c8e781840be157006127c94726fafc6343e5a8e and confirmed that tutd's size was trimmed in half. To disable the Haskell scripting feature, use cabal build -f-haskell-scripting. Note the additional "-" after the "-f".

agentm commented 4 years ago

@hughjfchen , if this is ready for review, please submit a pull request. This would be a great improvement.

hughjfchen commented 4 years ago

@agentm , sorry for the delay. Busy during the covid-19. I've submit a PR for review. Currently, it works well on Linux/AMD64 and Linux/ARM64, for OSX, it will be terminated due to the sudo-asking-for-password issue on travis-ci, but it works well on my local machine and I'm still working on it. For windows, I can't test it because I don't have windows env.

hughjfchen commented 4 years ago

After several attempts, I run the CI on osx successfully by picking a latest version of the travis-ci osx image. Now CI can be running on linux/osx.

hughjfchen commented 4 years ago

I pushed the built docker image to my repository. If you have docker install, you can run it with following command, no matter which platform you're running, linux/windows/osx:

docker run -d --rm -p 54321:54321 hughjfchen/hughjfchen:project-m36-0.7

The above command will start the project-m36-server on the port 54321, if you want to start the websocket server, just use the following command:

docker run -d --rm -p 54322:54322 hughjfchen/hughjfchen:project-m36-0.7 project-m36-websocket-server

The image also package all tutd and all examples, so if you want to explore tutd, refer to this:

docker run -it --rm hughjfchen/hughjfchen:project-m36-0.7 tutd

hughjfchen commented 4 years ago

OK. I've created a projectm36 organization(the org name cannot include a hypen) and a project-m36 repository and uploaded the image to this repository, so now you can pull from this repository instead from my personal repository:

docker run -d --rm -p 54321:54321 projectm36/project-m36:0.7

docker run -d --rm -p 54322:54322 projectm36/project-m36:0.7 project-m36-websocket-server

docker run -it --rm projectm36/project-m36:0.7 tutd

I plan to update the ci/cd script to upload the image to this repository automatically after successfully built.

agentm commented 4 years ago

That's great news! Thanks for working on this!

I'll try this out and get it merged as soon as possible.

hughjfchen commented 4 years ago

@agentm , OK. I've implemented the deployment feature which it will deploy the built docker image to the local travis-ci build box so that some integration tests may be run against it and it also tag the image according to the haskell cabal package version and push to the docker hub and ready for consumption. Since it will login to the ducker hub and push the tagged image to the projectm36 organization, I created a docker hub account for this purpose. I'll send the username/password to your once the PR integrated into master. You also need to set the following environment variables in the setting page of your project-m36 repository on the travis-ci site:

project_m36_DOCKER_HUB_USERNAME project_m36_DOCKER_HUB_PASSWORD

Once set, the whole CI/CD proccess will be automatic.

agentm commented 3 years ago

We now offer pre-built docker downloads which can run on all three common platforms.