SpencerPark / IJava

A Jupyter kernel for executing Java code.
MIT License
1.09k stars 212 forks source link

Reusing IJava kernels #87

Open mrcalvin opened 4 years ago

mrcalvin commented 4 years ago

Motivation: We aim at extending an eAssessment environment at our university to support programming exams (not Notebook based, but integrated into our LMS). We want to have question types that require to enter from scratch or to manipulate given code snippets.

Context: We plan to host IJava (and other) kernels to drive running exams for large number of students (120+ sessions/ exams) and have code snippets evaluate (for immediate feedback, but also as a batch for autocorrection against a test suite). We are currently exploring Jupyter's API gateway to integrate with the Web APIs (HTTP, Web Socket).

Problem: How to best manage the lifecycle of IJava kernels? We want to avoid to spawn, for n exams, n IJava kernels. Rather, can a smaller number of m kernels serve n exams with m < n? Likewise, for batch correcting, we want to be able to reuse one kernel for each exam (without unwanted interactions). In essence, for a running exam or for a correction of collected exams, we want to:

  1. Acquire a IJava Kernel and a corresponding web socket via HTTP
  2. Submit some code via web socket as part of an exam solution for evaluation, one-by-one or as a batch, display or collect the results.
  3. Recycle the executing kernel to serve the subsequent request in a clean (pristine) state (e.g., via a HTTP request to the restart endpoint).

From what I understand, a kernel maintains the state unless restarted. Is it a cheap operation for IJava kernels to be restarted, or is it a dry shutdown/ start. Is there a lightweight equivalent of /reset in the JShell API? Any other approaches to recycle kernels? Are there ways to serialise and persist a JShell's state and deserialize it (to have some sort of continuation when students/exams move between reused kernels)?

Thanks for your ideas!

SpencerPark commented 4 years ago

Hi @mrcalvin, I've been mulling this over. The kernel gateway is a really nice way to use kernels as a service and although it may be possible to isolate the kernel state (even run multiple kernels on the same jvm) but we do run into some problems with sharing the classpath and more importantly, sharing system resources like the filesystem. Letting people run arbitrary code really makes this tricky. Persisting state is another one.

In the context of an exam, I would guess that being confident in isolating the executions from each other would be very important. What I think is a better bet is to instead write a more suitable docker image for this use case. Let the sandboxing happen at that level instead. There are microservice architectures running java and this is sounding like what is appropriate here.

There are alternative jvm implementations like Graal or OpenJ9 that are pretty good for quick startup and smaller memory footprint.

To answer some more of your questions:

From what I understand, a kernel maintains the state unless restarted.

Yes.

Is it a cheap operation for IJava kernels to be restarted, or is it a dry shutdown/ start.

The process/jvm is shutdown and started up again.

Is there a lightweight equivalent of /reset in the JShell API?

One thing to consider is that by default, jshell is running code in a 2nd jvm (from my understanding last time I took a look and IJava was doing the same in the beginning, IJava is now running executed code in the same kernel, much better eval support here). I would expect a reset is starting up a new execution jvm.

Are there ways to serialise and persist a JShell's state and deserialize it (to have some sort of continuation when students/exams move between reused kernels)?

JShell does similar operations by essentially replaying the snippets that change declarations. But I don't know that we can do this to "continue where we left off". There might be some subtle differences that could cause quite unexpected behavior, hard to capture the entire effect of running some code on the system and serialize that effectively.

So maybe a TL;DR, try a microservice architecture instead of doing this at the kernel level :) Please let me know what you end up trying, always curious to see how people are using the kernel!