[RFC] Paradigm Shift - Githubissues

codewars / codewars-runner-cli

Old CodeRunner project. See https://github.com/codewars/runner instead.

GNU Affero General Public License v3.0

402 stars 141 forks source link

[RFC] Paradigm Shift #318

Closed kazk closed 7 years ago

kazk commented 7 years ago

Abstract

A new paradigm to improve maintainability with extensibility. Keywords: simple, decentralized, extensible, compatible, flexible.

Background

There is currently no way of handling language/package versioning well.

Proposal

Modularize

Provide each environment as a dockerized micro application taking JSON from STDIN. ~~Any language can be used.~~ This should also allow existing code to be used as ~~micro~~ application by providing adapters as necessary.

Image Tags

Use tags to identify language version, test framework, topic etc. This allows additions without worrying about backward compatibility and customizations based on need.

cwr/python:2.7: Python 2.7.x
cwr/python:3.5-unittest: Python 3.5.x with unittest
cwr/python:3.6-unittest-sci: Python 3.6.x for scientific kata

Rationale

`docker cp`

A more ideal solution would involve keeping the CLI outside of Docker (or within its own sibling container) and communicating to language specific containers. [...] This would involve having to utilize the docker cp command to copy files to the docker container in order to be compiled.

This approach requires the CLI to know the specific paths for each language containers.

No base image

Every image is independent and will remain independent. Dockerfile will contain some common boilerplates, but this should be a small tradeoff for keeping things simple. One possible exception can be when providing an environment based on topic. For example, cwr/python:3.6-unittest-sci should be derived from cwr/python:3.6-unittest because these should be coupled.

Backward compatibility

Currently, fixing a bug may break existing kata if the author made use of that bug. This proposal introduces a way to avoid this problem by using image tags to control the environment.

Contribution barrier

~~Currently, JavaScript knowledge is required to add new language and/or fix problems in certain language.~~ Decentralized design and flexibility presented in this proposal will lower the barrier of contribution.

Implementation

I realized the difficulty of maintaining a project with multiple languages, so I'm discarding this part. I'll post a follow up comment about this soon.

Usage

(Using jo to produce JSON for brevity)

Running

jo code='console.log("Node v7")' \
  | docker run -i --rm cwr/node:7

jo code='puts "Ruby 2.3"' \
  | docker run -i --rm cwr/ruby:2.3

jo code='print "Python 2.7"' \
  | docker run -i --rm cwr/python:2.7

jo code='print("Python 3.5")' \
  | docker run -i --rm cwr/python:3.5

Test Integration

// solution.js
"use strict";
const add = (a, b) => a + b;
module.exports = add;

// test.js
"use strict";
const add = require('./solution');
Test.assertEquals(add(1, 1), 2);

jo solution=@solution.js fixture=@test.js \
  | docker run -i --rm cwr/node:7-cw3
# key=@name reads the file `name`

Extensions

Some ideas for future extensions.

Build Sequence

jo build[]=getResources \
   build[]=andDoSomething \
   build[]=inThisOrder \
   solution=@solution.js \
   fixture=@test.js \
  | docker run -i --rm cwr/node:7-cw3-buildseq

Multiple Files

jo files[module1]=@module1.js \
   files[module2]=@module2.js \
   files[module3]=@module3.js \
   files[index]=@solution.js \
   fixture=@test.js \
  | docker run -i --rm cwr/node:7-cw3-modular

@jhoffner, @Dan-Nolan, @OverZealous, @nathandoctor

jhoffner commented 7 years ago

Some really good thoughts in here and very well documented. I'm actually working on CR CLI updates all week (related to extending the format to handling multi-files/projects properly), so I'll continue to unpack these thoughts while I iterate.

My initial knee jerk reaction is that I would still like to keep the actual runner code that packs the code and runs the executable to be written in JS. While you make a great point that external contributions will be easier if knowing both the target language and JS aren't necessary, keeping as much of the logic within the JS realm will make it easier for the team to maintain the codebase. We (the Qualified/Codewars team) are already in a predicament where none of us know clojure and can't maintain the jvm-runner ourselves without outside help (or time to go learn a new language).

With that said, your thoughts about using tag based images are great. Using docker cp should still allow us to not need a special base image at all (except for the fact that we need to provide database services, which requires orchestrating multiple linked containers to remove the dependency). The docker cp should also likely limit the amount of rework, as we could just replace the runCode callback so that it runs the command directly within a new container and not the CLI container - making the change, in theory, transparent.

kazk commented 7 years ago

My initial knee jerk reaction is that I would still like to keep the actual runner code that packs the code and runs the executable to be written in JS. While you make a great point that external contributions will be easier if knowing both the target language and JS aren't necessary, keeping as much of the logic within the JS realm will make it easier for the team to maintain the codebase.

I completely overlooked this point. Maintainability by your team should come first. I've read most of the code, but not the "jvm-runner" directory :( I haven't done any Lisp for a long time.

(except for the fact that we need to provide database services, which requires orchestrating multiple linked containers to remove the dependency).

I was thinking deriving another image with database service added will be enough? Kata author can select this "add-on" when authoring. If I'm not misunderstanding Docker, only the difference is made and original is unaffected.

# node/7/db/Dockerfile
FROM cwr/node:7

# install database
# Do database setup

kazk commented 7 years ago

I played with this idea and now I have something like this:

.
|-- clojure
|   `-- lein2
|       |-- src
|       |   |-- index.js
|       |   |-- package.json
|       |   `-- yarn.lock
|       |-- Dockerfile
|       `-- main.js
|-- java
|   `-- 8
|       |-- src
|       |   |-- index.js
|       |   |-- package.json
|       |   `-- yarn.lock
|       |-- Dockerfile
|       `-- main.js
|-- node
|   |-- 6
|   |   |-- src
|   |   |   |-- index.js
|   |   |   |-- package.json
|   |   |   `-- yarn.lock
|   |   |-- Dockerfile
|   |   `-- main.js
|   `-- 7
|       |-- src
|       |   |-- index.js
|       |   |-- package.json
|       |   `-- yarn.lock
|       |-- Dockerfile
|       `-- main.js
|-- python
|   |-- 2.7
|   |   |-- src
|   |   |   |-- index.js
|   |   |   |-- package.json
|   |   |   `-- yarn.lock
|   |   |-- Dockerfile
|   |   `-- main.js
|   `-- 3.5
|       |-- src
|       |   |-- index.js
|       |   |-- package.json
|       |   `-- yarn.lock
|       |-- Dockerfile
|       `-- main.js
|-- ruby
|   `-- 2.3
|       |-- src
|       |   |-- index.js
|       |   |-- package.json
|       |   `-- yarn.lock
|       |-- Dockerfile
|       `-- main.js
|-- util
|   `-- get-json.js
|-- Makefile
|-- README.md
`-- docker-compose.yml

Looks little complicated, but Makefile and docker-compose.yml automates most. I haven't worked on clojure yet, but other languages can run the given code. I tried reading "jvm-runner" and I couldn't go through :|

I'm not really good at explaining my idea, so I might just make a prototype that you can play with.

kazk commented 7 years ago

I haven't been able to work on this because I've been sick for the past few days. I'm hoping to recover soon...

@jhoffner, can you show me how you'd use docker cp? I don't think I'm understanding your idea correctly.

I've came up with:

create files locally
docker cp them to the language container
compile and run
cleanup local files

create files in CLI container
docker cp them to the language container
compile and run

jhoffner commented 7 years ago

I've never used cp but my hope was that the first option would work.

kazk commented 7 years ago

The first option should work fine, but I'm not seeing the benefits of creating the files locally. Is there a reason for this?

I think it's also because we're having different estimates on the amount of rework and I interpreted "Paradigm Shift" too literally. I have a list of issues that I mentioned during the PR for Go and I think some are hard to resolve without proper rewrite and clear paradigm shift.

jhoffner commented 7 years ago

Actually now that I look back at your 2 ideas, I'm not sure what the difference is. I guess you mean to say that with #1 there is only one container (the language container) and with #2 there are two containers?

If that's the case, they are both the same in my opinion. Yes I fully expect the CLI to run within its own container but functionally it shouldn't matter if it is or isn't in a container. You can mount a container volume with /var/run/docker.sock:/var/run/docker.sock to allow one container access to create other containers. As long as docker is installed within the CLI image then it functions as you would expect. Obviously that would mean that the CLI container needs to be better protected from user submitted code.

Either way though, the CLI wouldn't really have any idea that it is or is not in a container.

kazk commented 7 years ago

I think I understand your idea better now. I was simply trying to come up with all possible ways how you'd use docker cp and I agree it's same. And you are correct about the second one having 2 containers. I think sibling containers by binding docker.sock or Docker in Docker is unnecessary and like you wrote it requires extra care. My proposal is basically to remove the central CLI.

I see you did a lot of work and I need to read those lib/*.js. And thanks for the .editorconfig. I was thinking of opening PR for whitespace problems.

How would you resolve this issue for node-runner without proper rewrite?

If the user has selected Node v0.10.x, the code is executed with /usr/local/n/versions/node/0.10.33/bin/node. But the packages are installed using /usr/local/bin/npm (npm v3.10.3) and not /usr/local/n/versions/node/0.10.33/bin/npm (npm v1.4.28). pakage.json has optional engines field to check if versions of node/npm are compatible and current setup is bypassing this.

kazk commented 7 years ago

Some other node-runner issues:

devDependencies and dependencies needs to be cleaned up and version locked
- Outdated packages #299
Unclear which module are intended for users to use
- Users' code should get a separate local node_modules
Will moving /tmp/node_modules instead of copying cause any issue?

const exec = require('child_process').exec;
exec('du -hs /tmp/node_modules', (_, o) => console.log(o));
exec('du -hs /runner/node_modules', (_, o) => console.log(o));
// 489M /runner/node_modules
// 489M /tmp/node_modules

Maybe switching to Yarn might resolve the issue about having to installing everything at once. Or create a separate image for this purpose and compose.

jhoffner commented 7 years ago

Recent Changes

Yeah I added a bunch of additions to A: bring all of the different configuration options into one place and B: support projectMode, which is that instead of using code/setup/fixture fields and mapping them in some arbitrary way (via concatenation or file importing or you don't really know so you need to look it up), we just upload a set of files in {"solution.js": "content", "helpers.js": "content", "spec.js": "content"} format. You can define the entrypoint ("spec.*" by default). This is all being used by a new IDE that I've been working on which will hopefully replace CW's existing editors some point soon.

The project file format may or may not make it into CW soon, it's specifically being developed for advanced Qualified challenges.

.editorconfig

Yeah I also can't stand that 4 characters (and { falling to the next line) somehow became the default for most of the code. I want to change that but will probably do it as its own commit.

How would you resolve this issue for node-runner without proper rewrite?

Each node version would have its own docker image, tagged similar to how you had originally proposed. Each image would extend from its official base image (FROM node:0.10.33) and simply install any packages that we wish it to have. Ideally nothing else should be needed and the image itself doesn't need to be specifically related to the CLI/CW/Q in any way.

Again, this means that a linking system needs to be developed so that if database services are needed a container can be properly created and linked, so that they don't need to exist within the base image. Whether the CLI handles this on its own (might have to in order to ensure tests can be integrated) or whether it's handled by our custom scheduling engine (might have to in order to properly manage resources) is TBD.

Yarn

Yeah I haven't used Yarn yet but that would be great if it fixed the issues. Having to use devDependencies to install node packages instead of just doing it directly in the image is not ideal. Also each time a new dependency is added the entire package list needs to be reinstalled which is another drawback of the current design.

kazk commented 7 years ago

Yeah I also can't stand that 4 characters (and { falling to the next line) somehow became the default for most of the code. I want to change that but will probably do it as its own commit.

Thanks, I couldn't stand them either and I've been making a separate copy and reformat as I read through the codebase.

Again, this means that a linking system needs to be developed so that if database services are needed a container can be properly created and linked, so that they don't need to exist within the base image. Whether the CLI handles this on its own (might have to in order to ensure tests can be integrated) or whether it's handled by our custom scheduling engine (might have to in order to properly manage resources) is TBD.

Isn't this one of the most common use case of docker-compose? Is there any specific requirements?

Having to use devDependencies to install node packages instead of just doing it directly in the image is not ideal.

Do you mean using package.json to manage the dependencies is not ideal? I wrote needs cleanup in above comment because I don't know why modules never used in CLI are listed in the root package.json under devDependencies or what makes one devDependencies and not dependencies.

I haven't looked at the changes you've made, but your plans sound good and realistic. I took paradigm shift literally (runner-cli to runner-pipe) and had too much fun thinking :)

jhoffner commented 7 years ago

Isn't this one of the most common use case of docker-compose? Is there any specific requirements?

Yes, though we likely wouldn't use docker compose in this case, but just link the containers directly (compose is just a wrapper to make the docker CLI easier to work with in a repeatable manner). The bigger issue though is that the Codewars platform is unique from most other Docker based services in that each time you run your code, you are getting a fresh container. With services, that now means we need to have multiple freshly created containers. We also prewarm containers so that they are ready to be used, which adds to the complexity (do we now need to prewarm multiple database containers?).

I do have a plan in place to start supporting persistent containers, one per user/challenge session. This would allow services to be setup more easily, and also allow us to run install scripts so that challenges could be configured on session start - additional packages added, services installed, etc.

Do you mean using package.json to manage the dependencies is not ideal?

Not really, just because we can install specific versions within the docker file if we want to, and we can cache them, so that whenever someone adds a new package, the docker image only needs to be built from that point, instead of having to reinstall all of the packages all over again.

had too much fun thinking :)

I appreciate you helping me think this through!

kazk commented 7 years ago

With services, that now means we need to have multiple freshly created containers. We also prewarm containers so that they are ready to be used, which adds to the complexity (do we now need to prewarm multiple database containers?).

For the services, how about providing a derived image, e.g., codewars/node-runner:7-db? Wouldn't this allow you to use the same logic to "prewarm" the container? And control this by adding a new checkbox ([x] Use database) in kata editor and toggling -db modifier. This can also extend to specific databases ([x] Use MongoDB) to avoid getting large images with multiple databases installed.

I do have a plan in place to start supporting persistent containers, one per user/challenge session. This would allow services to be setup more easily, and also allow us to run install scripts so that challenges could be configured on session start - additional packages added, services installed, etc.

Sounds cool :) Maybe using docker commit after running install scripts? This should be less expensive and allows easy way to reset the environment if necessary.

jhoffner commented 7 years ago

For the services, how about providing a derived image

Interesting idea, but that would mean we would need to maintain an install script and probably need to build a deploy tool that would automatically generate the derived docker file for each service combination, then upload that combination. Even if we just support 3 services (mongodb, redis, postgre) within different combinations, we currently support 15 different language images, which means we would need to maintain 15 * 9 = 135 different image combinations, and then want to have those images preloaded on machines so the download cost doesn't occur. With persistent images the preloaded issue is mitigated (though we still ideally want that preloading process to be as fast as possible) - however thats still a lot of images to maintain (and would grow significantly as we add languages/services).

This is on top of the fact that we are reinventing the wheel by having to support our own install scripts for these database images, and not to mention that we have version issues with databases the same as we do with languages - so it seems scalability wise the true fix here has to be to support a multi-linked-container paradigm.

My dream result here is that you could actually just specify which docker image you want your language to run in (anything hosted in Docker Hub), as well as specify services (and be able to pick a database version) and the API would stitch it all together for you.

Maybe using docker commit

Great idea. It would require some special file management to track all of these images and make sure they aren't taking up a specific machines hd space when they aren't needed, but that would speed up install times to basically zero.

kazk commented 7 years ago

Thanks for explaining. That clearly doesn't scale.

My dream result here is that you could actually just specify which docker image you want your language to run in (anything hosted in Docker Hub), as well as specify services (and be able to pick a database version) and the service would run it all for you.

I thought similar while thinking about the new paradigm and realized similarities with some CI services.

Drone is a Continuous Integration platform built on container technology. Every build is executed inside an ephemeral Docker container, giving developers complete control over their build environment with guaranteed isolation. drone/drone

pipeline:
  build:
    image: golang
    commands:
      - go get
      - go build
  database:
    image: redis
  test:
    image: golang
    commands:
      - go test

Also, the experimental docker bundle and stack looks interesting (haven't read it all).

A Dockerfile can be built into an image, and containers can be created from that image. Similarly, a docker-compose.yml can be built into a distributed application bundle, and stacks can be created from that bundle. In that sense, the bundle is a multi-services distributable image format. https://docs.docker.com/compose/bundles

jhoffner commented 7 years ago

Yep, very similar to Docker based CI services (which most are these days) and also cloud IDEs. The biggest difference is a focus on simplicity, easy of setup, startup time, and of course our unified test output format.

The compose bundles is new to me, thanks for letting me know about this!

jhoffner commented 7 years ago

@kazk closing since this isn't actionable. Great thought process though. I've also been thinking about updating the images so that there is no base image, and only an install script that adds node and the rest of the dependencies. A few things that would mean:

redis/mongodb/postgre would need to be linked as a separate container instead of embedded in the image
rebuilding an image would take a little longer - though adding node shouldn't take a huge amount of space, and we already defer installing the node modules until the end.

kazk commented 7 years ago

@jhoffner I've been thinking about this since reading your comment. I came up with another idea that's more realistic allowing progressive enhancements.

I still need to put together my thoughts, but I think the new idea:

allows progressive enhancements
helps with backward compatibility
has more organized structure
makes the codebase easier to test

with small amount of rework and improves extensibility. I might have missed something because this sounds too good :)

kazk commented 7 years ago

@jhoffner Here's my new proposal. I actually like this one over the original. Please let me know what you think.

Abstract

A new paradigm to improve maintainability with extensibility. Takes advantage of NPM and improves flexibility by packaging components.

Background

There is currently no way of handling language/package versioning well.

Previous proposal was fun to discuss, but wasn't realistic.

Proposal

Modularize the codebase into smaller packages. Avoid the pain of maintaining multiple repositories by using Lerna :dragon:. This allows much more flexibility while keeping the convenience of having the code in one place.

Rationale

Backward Compatibility

Allows progressive enhancements. Each language can update when it's ready.

Image tags can be used in conjunction to keep track of compatibilities.

Contribution Barrier

Decentralized design and flexibility presented in this proposal will lower the barrier of contribution.

Build Context

This structure allows having Dockerfile instead of .docker. Each image can have .dockerignore to keep build contexts small.

package.json

package.json has become unmanageable mixing dependencies with devDependencies.

Implementation

Create @codewars org on npmjs.com (free for public packages) and restructure the codebase to something similar to:

# repository: codewars/codewars-runners
- `packages`
  - `codewars-runner`
    - `package.json`
      - `name: @codewars/runner`
      - `version: "1.0.0"`
- `runners`
  - `codewars-node-runner`
    - `Dockerfile`
    - `package.json`
      - `name: @codewars/node-runner`
      - `version: "1.0.0"`
      - `dependencies:`
        - `@codewars/runner: "^1.0.0"`
  - `codewars-ruby-runner`
    - `Dockerfile`
    - `package.json`
      - `name: @codewars/ruby-runner`
      - `version: "1.0.0"`
      - `dependencies:`
        - `@codewars/runner: "^1.0.0"`
- `lerna.json`
- `package.json`

When a breaking change is introduced, increment the major version and update runners progressively.

For example, @codewars/runner and @codewars/ruby-runner becomes 2.0.0, but @codewars/node-runner can stay 1.0.0.

Can be separated further like:

- `packages`
  - `codewars-runner`
    - `package.json`
      - `name: @codewars/runner`
      - `version: "1.0.0"`
      - `dependencies:`
        - `@codewars/runner-util: "^1.2.1"`
        - `@codewars/runner-service-manager: "^1.0.0"`
  - `codewars-runner-util`
  - `codewars-runner-service-manager`
- `runners`
  - `codewars-node-runner`
  - `codewars-ruby-runner`
  - `codewars-python-runner`
  - `codewars-go-runner`
- `lerna.json`
- `package.json`

and improve the codebase in smaller steps.