Closed agentm closed 4 years ago
@agentm , I have a CI/CD framework which I built with shell script from scratch. It should be able to build M36, but unfortunately, it doesn't support windows( it does support Linux and MacOS X), I'll try and see what happens. And regarding to the release, I guess a docker image maybe more friendly than a binary? With a docker image, user can just pull it from the repos through internet and boot it to play around.
@hughjfchen, that sounds great. It would be great for users to be able try out a docker image. Does your script work with travis CI?
@agentm , there're just shell scripts which well integrate into your repository and can be used within the travis.yml, so it should be work with travis CI.
@agentm , I'd already built all executables for project-m36 with my CI/CD framework but failed on running the test suite. The error message said:
`running tests
Running 20 test suites...
Test suite test-relation-import-csv: RUNNING...
Test suite test-relation-import-csv: PASS
Test suite logged to: dist/test/project-m36-0.7-test-relation-import-csv.log
Test suite test-relation: RUNNING...
Test suite test-relation: PASS
Test suite logged to: dist/test/project-m36-0.7-test-relation.log
Test suite test-transactiongraph-persist: RUNNING...
Cases: 3 Tried: 2 Errors: 0 Failures: 0Failed to load scripting engine- scripting disabled: ScriptSessionLoadError
test/TransactionGraph/Persist.hs:104 ScriptError ScriptCompilationDisabledError Cases: 3 Tried: 3 Errors: 0 Failures: 1`
@hughjfchen, the test suite relies on the project-m36 dynamic library, so you need to run cabal install
first.
@agentm , I use nix to build the docker. Anyway, I modified the CI script to run cabal new-test
within a nix-shell and the tests can continue running and the test-script test suit got PASSED, however, there is still ONE test case failed, following is the error message:
Running 1 test suites... Test suite test-server: RUNNING... The filesystem does not support journaling so writes may not be crash-safe. Use --disable-fscheck to > disable this fatal error. test-server: thread blocked indefinitely in an MVar operation Test suite test-server: FAIL Test suite logged to: /home/chenjf/projects/project-m36/dist-newstyle/build/x86_64-linux/ghc-8.6.5/project-m36-0.7/t/test-server/test/project-m36-0.7-test-server.log 0 of 1 test suites (0 of 1 test cases) passed.
That's a test that requires a journaling filesystem. Hm- there doesn't seem to be a way to disable it from within the test.
I can add an environment variable to disable it- would you be able to pass it down through nix?
@agentm , Yes, I can set an environment variable within a nix-shell before running the test, if you think that can help you decide if the specific test case(s) should be disable. Just let me know what environment variable(s).
Another question, I still got the Failed to load the script engine
error when I try to run the binary tutd
as following:
/nix/store/bpynflzknw42mbkf7zjibx5piwdcvihn-project-m36-0.7/bin/tutd Failed to load scripting engine- scripting disabled: ScriptSessionLoadError
: cannot satisfy -trust project-m36 (use -v for more information) Project:M36 TutorialD Interpreter 0.7 Type ":help" for more information. A full tutorial is available at: https://github.com/agentm/project-m36/blob/master/docs/tutd_tutorial.markdown TutorialD (master/main):
Does that means the binary program tutd
needs a GHC/cabal installation environment to run? If yes,
package such env into a docker image may lead to a very big size image.
Regarding to disable some test cases based on some settings, will it be better to split the test cases into a separate test suite and decide if to build and run it based on some cabal flags? It could be disabled by default, only build and run if the cabal flags set to true.
I think that nix tries to run tests by default when building but there is flag to disable them.
The last time I tried to build project-m36 with nix with @3noch, I think we hit the same roadblock. Specifically, the Project:M36 binaries want to see the Project:M36 dynamically-linked library in the haskell packages. (This is required for using Haskell as a backend scripting language, so it links against GHC as well.) However, the backend scripting is optional, so once you disable the tests, nothing should be requesting it.
@agentm , I'm not sure I completely understand you but I use cabal2nix
to generate nix drv expression for the haskell packages based on their cabal
file and you can pass a parameter to cabal2nix to disable test in the generated nix file. You can also pass cabal flags to the cabal2nix
command to generate a nix file with the flags.
Yea, I'm thinking that it would be a significant loss to our CI system if it can't run the tests. Building with nix is certainly worthwhile, but trading a test runner to get binaries out seems like a mistake. Sorry I didn't bring it up sooner- we did hit this barrier with nix before.
@agentm , We don't have to lose any tests now. We just run cabal new-test
within a nix-shell
and all tests can run without any problems. We just have ONLY ONE test case test-server
failed with the above error message which we need to address how to handle. Once it got solved and all tests passed, we can use nix-build
to build the docker image and push to some docker registry for consuming.
And for the specific failed test-server, I'm wondering why it would fail because the file system I'm using is an ext4
and it should be a journaling file system. Do you have any tips how to check if the file system is journaling? And we build m36 on that machine, however we may not run m36 on that machine. Even the building machine passed the test, doesn't mean the running machine will satisfy the requirement.
hi, @hughjfchen . I just found some command line about it. Hope this helps.
I do check my ext4
file system and found it should be journaling:
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit _bg dir_nlink extra_isize
Regarding tests, does that include the Haskell scripting tests?
Regarding the journaling filesystem, that is odd indeed. Here's the code which runs statfs and checks if the filesystem is in a list of filesystems which support journaling. I can disable that check in the test, but it would be good to know if there some better code to determine if a filesystem supports journaling under Linux (without requiring root access).
@agentm ,
Regarding tests, yes, they all run successfully and passed within a nix-shell
, including the Haskell scripting tests.
Regarding the specific journaling filesystem test case, as @YuMingLiao points to, there're some commands to check but they all need root
or sudo
. I googled but found no way to do that without root
or sudo
.
I disabled in the fs check for tests in master, so, hopefully, the tests should pass now in your environment. Let me know if you hit another issue.
@agentm , I've merged your update into my fork repository and all tests have been running and passed within the nix-shell and I'd already built a docker image which includes project-m36-server
, project-m36-websocket-server
and tutd
, but the two server
binaries will still print out the following message:
Failed to load scripting engine- scripting disabled: ScriptSessionLoadError
: cannot satisfy -trust project-m36 (use -v for more information)
I check the binary command line and see there is a option --ghc-pkg-dir
, do I need to install a ghc and its packages and include them into the docker image to run the binaries?
That's great news!
The server-side Haskell scripting is optional, but it would be nice to have it enabled. However, I had trouble enabling it with nix before. I think your best bet is to dump the nix environment to figure out where the ghc-pkg directory is. It should be the destination for the project-m36 dynamic library as a haskell package; the pre-built project-m36 package is also required to get the Haskell scripting to work.
@agentm , I guess we need to trade off. I can think of three options:
ghc-pkg-dir
command line option to point to the appropriate directory within the image
Pros: users don't need to worry about the dependencies and the program works out of the box.
Cons: the docker image is TOO BIG, I've roughly done some test, the final image size may over 4G and even with compressed tarball, it's still around 800Mb2.build the image without the dependencies Pros: the image size is around 60Mb, perfect for a docker image Cons: the end user has to deal with the dependencies if he/she wants to use the scripting engine
3.Disable the scripting feature Honestly, I don't know what impact will be for the user's experience without this feature.
What do you think we should take for next step?
Thanks for the detailed summary!
The Haskell scripting enables users to write and load server-side functions at runtime. Otherwise, such functions must be pre-compiled into the server.
For a first pass, I think it's fine to disable that feature for now.
Have you already tried your patch with travis?
What's your recommendation for how to best integrate the docker generation with continuous delivery?
@agentm ,
Have you already tried your patch with travis?
Yes,I've already tested with travis-ci and it passed all the tests and built the docker images successfully.
What's your recommendation for how to best integrate the docker generation with continuous delivery?
I'm still thinking about this. My idea is to register an account with the docker image hub and after having successfully built the docker images, push them to the hub under the account so that users can pull and play around with them.
@agentm , I noticed that even statically linked with haskell dependent libraries, the final size of the docker image still around 400Mb. I've checked the cabal file and found that project-m36 depends on ghc-boot
, ghci
etc, which make nix
calculates its runtime dependencies of a large set of packages, including ghc
and gcc
, do you think there's a better way to reduce the size of the final docker image?
Yes, if the Haskell scripting is disabled, then I can make those dependencies optional and hide them behind a cabal flag.
Ok, I made linking against GHC optional in 6c8e781840be157006127c94726fafc6343e5a8e and confirmed that tutd
's size was trimmed in half. To disable the Haskell scripting feature, use cabal build -f-haskell-scripting
. Note the additional "-" after the "-f".
@hughjfchen , if this is ready for review, please submit a pull request. This would be a great improvement.
@agentm , sorry for the delay. Busy during the covid-19. I've submit a PR for review. Currently, it works well on Linux/AMD64 and Linux/ARM64, for OSX, it will be terminated due to the sudo-asking-for-password issue on travis-ci, but it works well on my local machine and I'm still working on it. For windows, I can't test it because I don't have windows env.
After several attempts, I run the CI on osx successfully by picking a latest version of the travis-ci osx image. Now CI can be running on linux/osx.
I pushed the built docker image to my repository. If you have docker install, you can run it with following command, no matter which platform you're running, linux/windows/osx:
docker run -d --rm -p 54321:54321 hughjfchen/hughjfchen:project-m36-0.7
The above command will start the project-m36-server
on the port 54321
, if you want to start the websocket server, just use the following command:
docker run -d --rm -p 54322:54322 hughjfchen/hughjfchen:project-m36-0.7 project-m36-websocket-server
The image also package all tutd
and all examples, so if you want to explore tutd
, refer to this:
docker run -it --rm hughjfchen/hughjfchen:project-m36-0.7 tutd
OK. I've created a projectm36
organization(the org name cannot include a hypen) and a project-m36
repository and uploaded the image to this repository, so now you can pull from this repository instead from my personal repository:
docker run -d --rm -p 54321:54321 projectm36/project-m36:0.7
docker run -d --rm -p 54322:54322 projectm36/project-m36:0.7 project-m36-websocket-server
docker run -it --rm projectm36/project-m36:0.7 tutd
I plan to update the ci/cd
script to upload the image to this repository automatically after successfully built.
That's great news! Thanks for working on this!
I'll try this out and get it merged as soon as possible.
@agentm ,
OK. I've implemented the deployment feature which it will deploy the built docker image to the local travis-ci build box so that some integration tests may be run against it and it also tag the image according to the haskell cabal package version and push to the docker hub and ready for consumption. Since it will login to the ducker hub and push the tagged image to the projectm36
organization, I created a docker hub account for this purpose. I'll send the username/password to your once the PR integrated into master. You also need to set the following environment variables in the setting
page of your project-m36
repository on the travis-ci
site:
project_m36_DOCKER_HUB_USERNAME project_m36_DOCKER_HUB_PASSWORD
Once set, the whole CI/CD proccess will be automatic.
We now offer pre-built docker downloads which can run on all three common platforms.
I think that requiring users to compile Project:M36 on their own is a significant barrier to entry to trying this project. A pre-built binary, especially for Windows, would likely draw in more users. To that end, it would be great if the CI builders running on the master branch generated binaries and created bundled releases.
This would be a high-impact project improvement but requires YAML and devops experience instead of Haskell knowledge.
The Tweag Fellowship may even be willing to sponsor such work, in case anyone wishes to apply.
More specifically, the first-time TutorialD user experience could be improved by: