Closed kazk closed 7 years ago
Some really good thoughts in here and very well documented. I'm actually working on CR CLI updates all week (related to extending the format to handling multi-files/projects properly), so I'll continue to unpack these thoughts while I iterate.
My initial knee jerk reaction is that I would still like to keep the actual runner code that packs the code and runs the executable to be written in JS. While you make a great point that external contributions will be easier if knowing both the target language and JS aren't necessary, keeping as much of the logic within the JS realm will make it easier for the team to maintain the codebase. We (the Qualified/Codewars team) are already in a predicament where none of us know clojure and can't maintain the jvm-runner ourselves without outside help (or time to go learn a new language).
With that said, your thoughts about using tag based images are great. Using docker cp
should still allow us to not need a special base image at all (except for the fact that we need to provide database services, which requires orchestrating multiple linked containers to remove the dependency). The docker cp
should also likely limit the amount of rework, as we could just replace the runCode
callback so that it runs the command directly within a new container and not the CLI container - making the change, in theory, transparent.
My initial knee jerk reaction is that I would still like to keep the actual runner code that packs the code and runs the executable to be written in JS. While you make a great point that external contributions will be easier if knowing both the target language and JS aren't necessary, keeping as much of the logic within the JS realm will make it easier for the team to maintain the codebase.
I completely overlooked this point. Maintainability by your team should come first. I've read most of the code, but not the "jvm-runner" directory :( I haven't done any Lisp for a long time.
(except for the fact that we need to provide database services, which requires orchestrating multiple linked containers to remove the dependency).
I was thinking deriving another image with database service added will be enough? Kata author can select this "add-on" when authoring. If I'm not misunderstanding Docker, only the difference is made and original is unaffected.
# node/7/db/Dockerfile
FROM cwr/node:7
# install database
# Do database setup
I played with this idea and now I have something like this:
.
|-- clojure
| `-- lein2
| |-- src
| | |-- index.js
| | |-- package.json
| | `-- yarn.lock
| |-- Dockerfile
| `-- main.js
|-- java
| `-- 8
| |-- src
| | |-- index.js
| | |-- package.json
| | `-- yarn.lock
| |-- Dockerfile
| `-- main.js
|-- node
| |-- 6
| | |-- src
| | | |-- index.js
| | | |-- package.json
| | | `-- yarn.lock
| | |-- Dockerfile
| | `-- main.js
| `-- 7
| |-- src
| | |-- index.js
| | |-- package.json
| | `-- yarn.lock
| |-- Dockerfile
| `-- main.js
|-- python
| |-- 2.7
| | |-- src
| | | |-- index.js
| | | |-- package.json
| | | `-- yarn.lock
| | |-- Dockerfile
| | `-- main.js
| `-- 3.5
| |-- src
| | |-- index.js
| | |-- package.json
| | `-- yarn.lock
| |-- Dockerfile
| `-- main.js
|-- ruby
| `-- 2.3
| |-- src
| | |-- index.js
| | |-- package.json
| | `-- yarn.lock
| |-- Dockerfile
| `-- main.js
|-- util
| `-- get-json.js
|-- Makefile
|-- README.md
`-- docker-compose.yml
Looks little complicated, but Makefile
and docker-compose.yml
automates most. I haven't worked on clojure
yet, but other languages can run the given code. I tried reading "jvm-runner" and I couldn't go through :|
I'm not really good at explaining my idea, so I might just make a prototype that you can play with.
I haven't been able to work on this because I've been sick for the past few days. I'm hoping to recover soon...
@jhoffner, can you show me how you'd use docker cp
?
I don't think I'm understanding your idea correctly.
I've came up with:
docker cp
them to the language containeror
docker cp
them to the language containerI've never used cp
but my hope was that the first option would work.
The first option should work fine, but I'm not seeing the benefits of creating the files locally. Is there a reason for this?
I think it's also because we're having different estimates on the amount of rework and I interpreted "Paradigm Shift" too literally. I have a list of issues that I mentioned during the PR for Go and I think some are hard to resolve without proper rewrite and clear paradigm shift.
Actually now that I look back at your 2 ideas, I'm not sure what the difference is. I guess you mean to say that with #1 there is only one container (the language container) and with #2 there are two containers?
If that's the case, they are both the same in my opinion. Yes I fully expect the CLI to run within its own container but functionally it shouldn't matter if it is or isn't in a container. You can mount a container volume with /var/run/docker.sock:/var/run/docker.sock
to allow one container access to create other containers. As long as docker is installed within the CLI image then it functions as you would expect. Obviously that would mean that the CLI container needs to be better protected from user submitted code.
Either way though, the CLI wouldn't really have any idea that it is or is not in a container.
I think I understand your idea better now. I was simply trying to come up with all possible ways how you'd use docker cp
and I agree it's same. And you are correct about the second one having 2 containers.
I think sibling containers by binding docker.sock
or Docker in Docker is unnecessary and like you wrote it requires extra care. My proposal is basically to remove the central CLI.
I see you did a lot of work and I need to read those lib/*.js
. And thanks for the .editorconfig
. I was thinking of opening PR for whitespace problems.
How would you resolve this issue for node-runner
without proper rewrite?
If the user has selected Node v0.10.x
, the code is executed with /usr/local/n/versions/node/0.10.33/bin/node
. But the packages are installed using /usr/local/bin/npm
(npm v3.10.3) and not /usr/local/n/versions/node/0.10.33/bin/npm
(npm v1.4.28).
pakage.json
has optional engines field to check if versions of node
/npm
are compatible and current setup is bypassing this.
Some other node-runner
issues:
devDependencies
and dependencies
needs to be cleaned up and version locked
Unclear which module are intended for users to use
node_modules
Will moving /tmp/node_modules
instead of copying cause any issue?
const exec = require('child_process').exec;
exec('du -hs /tmp/node_modules', (_, o) => console.log(o));
exec('du -hs /runner/node_modules', (_, o) => console.log(o));
// 489M /runner/node_modules
// 489M /tmp/node_modules
Maybe switching to Yarn might resolve the issue about having to installing everything at once. Or create a separate image for this purpose and compose.
Recent Changes
Yeah I added a bunch of additions to A: bring all of the different configuration options into one place and B: support projectMode, which is that instead of using code/setup/fixture fields and mapping them in some arbitrary way (via concatenation or file importing or you don't really know so you need to look it up), we just upload a set of files in {"solution.js": "content", "helpers.js": "content", "spec.js": "content"}
format. You can define the entrypoint ("spec.*" by default). This is all being used by a new IDE that I've been working on which will hopefully replace CW's existing editors some point soon.
The project file format may or may not make it into CW soon, it's specifically being developed for advanced Qualified challenges.
.editorconfig
Yeah I also can't stand that 4 characters (and {
falling to the next line) somehow became the default for most of the code. I want to change that but will probably do it as its own commit.
How would you resolve this issue for node-runner without proper rewrite?
Each node version would have its own docker image, tagged similar to how you had originally proposed. Each image would extend from its official base image (FROM node:0.10.33
) and simply install any packages that we wish it to have. Ideally nothing else should be needed and the image itself doesn't need to be specifically related to the CLI/CW/Q in any way.
Again, this means that a linking system needs to be developed so that if database services are needed a container can be properly created and linked, so that they don't need to exist within the base image. Whether the CLI handles this on its own (might have to in order to ensure tests can be integrated) or whether it's handled by our custom scheduling engine (might have to in order to properly manage resources) is TBD.
Yarn
Yeah I haven't used Yarn yet but that would be great if it fixed the issues. Having to use devDependencies to install node packages instead of just doing it directly in the image is not ideal. Also each time a new dependency is added the entire package list needs to be reinstalled which is another drawback of the current design.
Yeah I also can't stand that 4 characters (and { falling to the next line) somehow became the default for most of the code. I want to change that but will probably do it as its own commit.
Thanks, I couldn't stand them either and I've been making a separate copy and reformat as I read through the codebase.
Again, this means that a linking system needs to be developed so that if database services are needed a container can be properly created and linked, so that they don't need to exist within the base image. Whether the CLI handles this on its own (might have to in order to ensure tests can be integrated) or whether it's handled by our custom scheduling engine (might have to in order to properly manage resources) is TBD.
Isn't this one of the most common use case of docker-compose
? Is there any specific requirements?
Having to use devDependencies to install node packages instead of just doing it directly in the image is not ideal.
Do you mean using package.json
to manage the dependencies is not ideal? I wrote needs cleanup in above comment because I don't know why modules never used in CLI
are listed in the root package.json
under devDependencies
or what makes one devDependencies
and not dependencies
.
I haven't looked at the changes you've made, but your plans sound good and realistic. I took paradigm shift literally (runner-cli
to runner-pipe
) and had too much fun thinking :)
Isn't this one of the most common use case of docker-compose? Is there any specific requirements?
Yes, though we likely wouldn't use docker compose in this case, but just link the containers directly (compose is just a wrapper to make the docker CLI easier to work with in a repeatable manner). The bigger issue though is that the Codewars platform is unique from most other Docker based services in that each time you run your code, you are getting a fresh container. With services, that now means we need to have multiple freshly created containers. We also prewarm containers so that they are ready to be used, which adds to the complexity (do we now need to prewarm multiple database containers?).
I do have a plan in place to start supporting persistent containers, one per user/challenge session. This would allow services to be setup more easily, and also allow us to run install scripts so that challenges could be configured on session start - additional packages added, services installed, etc.
Do you mean using package.json to manage the dependencies is not ideal?
Not really, just because we can install specific versions within the docker file if we want to, and we can cache them, so that whenever someone adds a new package, the docker image only needs to be built from that point, instead of having to reinstall all of the packages all over again.
had too much fun thinking :)
I appreciate you helping me think this through!
With services, that now means we need to have multiple freshly created containers. We also prewarm containers so that they are ready to be used, which adds to the complexity (do we now need to prewarm multiple database containers?).
For the services, how about providing a derived image, e.g., codewars/node-runner:7-db
? Wouldn't this allow you to use the same logic to "prewarm" the container? And control this by adding a new checkbox ([x] Use database
) in kata editor and toggling -db
modifier. This can also extend to specific databases ([x] Use MongoDB
) to avoid getting large images with multiple databases installed.
I do have a plan in place to start supporting persistent containers, one per user/challenge session. This would allow services to be setup more easily, and also allow us to run install scripts so that challenges could be configured on session start - additional packages added, services installed, etc.
Sounds cool :) Maybe using docker commit
after running install scripts? This should be less expensive and allows easy way to reset the environment if necessary.
For the services, how about providing a derived image
Interesting idea, but that would mean we would need to maintain an install script and probably need to build a deploy tool that would automatically generate the derived docker file for each service combination, then upload that combination. Even if we just support 3 services (mongodb, redis, postgre) within different combinations, we currently support 15 different language images, which means we would need to maintain 15 * 9 = 135 different image combinations, and then want to have those images preloaded on machines so the download cost doesn't occur. With persistent images the preloaded issue is mitigated (though we still ideally want that preloading process to be as fast as possible) - however thats still a lot of images to maintain (and would grow significantly as we add languages/services).
This is on top of the fact that we are reinventing the wheel by having to support our own install scripts for these database images, and not to mention that we have version issues with databases the same as we do with languages - so it seems scalability wise the true fix here has to be to support a multi-linked-container paradigm.
My dream result here is that you could actually just specify which docker image you want your language to run in (anything hosted in Docker Hub), as well as specify services (and be able to pick a database version) and the API would stitch it all together for you.
Maybe using docker commit
Great idea. It would require some special file management to track all of these images and make sure they aren't taking up a specific machines hd space when they aren't needed, but that would speed up install times to basically zero.
Thanks for explaining. That clearly doesn't scale.
My dream result here is that you could actually just specify which docker image you want your language to run in (anything hosted in Docker Hub), as well as specify services (and be able to pick a database version) and the service would run it all for you.
I thought similar while thinking about the new paradigm and realized similarities with some CI services.
Drone is a Continuous Integration platform built on container technology. Every build is executed inside an ephemeral Docker container, giving developers complete control over their build environment with guaranteed isolation. drone/drone
pipeline:
build:
image: golang
commands:
- go get
- go build
database:
image: redis
test:
image: golang
commands:
- go test
Also, the experimental docker bundle and stack looks interesting (haven't read it all).
A Dockerfile can be built into an image, and containers can be created from that image. Similarly, a docker-compose.yml can be built into a distributed application bundle, and stacks can be created from that bundle. In that sense, the bundle is a multi-services distributable image format. https://docs.docker.com/compose/bundles
Yep, very similar to Docker based CI services (which most are these days) and also cloud IDEs. The biggest difference is a focus on simplicity, easy of setup, startup time, and of course our unified test output format.
The compose bundles is new to me, thanks for letting me know about this!
@kazk closing since this isn't actionable. Great thought process though. I've also been thinking about updating the images so that there is no base image, and only an install script that adds node and the rest of the dependencies. A few things that would mean:
@jhoffner I've been thinking about this since reading your comment. I came up with another idea that's more realistic allowing progressive enhancements.
I still need to put together my thoughts, but I think the new idea:
with small amount of rework and improves extensibility. I might have missed something because this sounds too good :)
@jhoffner Here's my new proposal. I actually like this one over the original. Please let me know what you think.
A new paradigm to improve maintainability with extensibility. Takes advantage of NPM and improves flexibility by packaging components.
There is currently no way of handling language/package versioning well.
Previous proposal was fun to discuss, but wasn't realistic.
Modularize the codebase into smaller packages. Avoid the pain of maintaining multiple repositories by using Lerna :dragon:. This allows much more flexibility while keeping the convenience of having the code in one place.
Allows progressive enhancements. Each language can update when it's ready.
Image tags can be used in conjunction to keep track of compatibilities.
Decentralized design and flexibility presented in this proposal will lower the barrier of contribution.
This structure allows having Dockerfile
instead of .docker
.
Each image can have .dockerignore
to keep build contexts small.
package.json
has become unmanageable mixing dependencies
with devDependencies
.
Create @codewars
org on npmjs.com
(free for public packages) and restructure the codebase to something similar to:
# repository: codewars/codewars-runners
- `packages`
- `codewars-runner`
- `package.json`
- `name: @codewars/runner`
- `version: "1.0.0"`
- `runners`
- `codewars-node-runner`
- `Dockerfile`
- `package.json`
- `name: @codewars/node-runner`
- `version: "1.0.0"`
- `dependencies:`
- `@codewars/runner: "^1.0.0"`
- `codewars-ruby-runner`
- `Dockerfile`
- `package.json`
- `name: @codewars/ruby-runner`
- `version: "1.0.0"`
- `dependencies:`
- `@codewars/runner: "^1.0.0"`
- `lerna.json`
- `package.json`
When a breaking change is introduced, increment the major version and update runners progressively.
For example, @codewars/runner
and @codewars/ruby-runner
becomes 2.0.0
, but @codewars/node-runner
can stay 1.0.0
.
Can be separated further like:
- `packages`
- `codewars-runner`
- `package.json`
- `name: @codewars/runner`
- `version: "1.0.0"`
- `dependencies:`
- `@codewars/runner-util: "^1.2.1"`
- `@codewars/runner-service-manager: "^1.0.0"`
- `codewars-runner-util`
- `codewars-runner-service-manager`
- `runners`
- `codewars-node-runner`
- `codewars-ruby-runner`
- `codewars-python-runner`
- `codewars-go-runner`
- `lerna.json`
- `package.json`
and improve the codebase in smaller steps.
Abstract
A new paradigm to improve maintainability with extensibility. Keywords: simple, decentralized, extensible, compatible, flexible.
Background
Proposal
Modularize
Provide each environment as a dockerized micro application taking JSON from
STDIN
.Any language can be used.This should also allow existing code to be used asmicroapplication by providing adapters as necessary.Image Tags
Use tags to identify language version, test framework, topic etc. This allows additions without worrying about backward compatibility and customizations based on need.
cwr/python:2.7
: Python 2.7.xcwr/python:3.5-unittest
: Python 3.5.x withunittest
cwr/python:3.6-unittest-sci
: Python 3.6.x for scientific kataRationale
docker cp
This approach requires the CLI to know the specific paths for each language containers.
No base image
Every image is independent and will remain independent. Dockerfile will contain some common boilerplates, but this should be a small tradeoff for keeping things simple. One possible exception can be when providing an environment based on topic. For example,
cwr/python:3.6-unittest-sci
should be derived fromcwr/python:3.6-unittest
because these should be coupled.Backward compatibility
Currently, fixing a bug may break existing kata if the author made use of that bug. This proposal introduces a way to avoid this problem by using image tags to control the environment.
Contribution barrier
Currently, JavaScript knowledge is required to add new language and/or fix problems in certain language.Decentralized design and flexibility presented in this proposal will lower the barrier of contribution.Implementation
I realized the difficulty of maintaining a project with multiple languages, so I'm discarding this part. I'll post a follow up comment about this soon.
Usage
(Using jo to produce JSON for brevity)
Running
Test Integration
Extensions
Some ideas for future extensions.
Build Sequence
Multiple Files
@jhoffner, @Dan-Nolan, @OverZealous, @nathandoctor