Open Brian-Burns-Bose opened 8 years ago
There are some other questions as well - should a require("eclairjs") object be bound to only one kernel? Currently it is, and we create the kernel connection as the user loads in the eclairjs module.
I would say no - there should be a formal api (require("eclairjs").create() for example) that creates a kernel connection and the user should be able to retrieve the kernel id and use that during a create() call to connect to an existing context. Note that this would break stuff (for example our variable name creation code has no idea of existing variables and would start from 0 again, causing conflicts).
Elaborating on the above discussion:
What a node application that uses EclairJS needs:
In regards to session management, the best solutions seems to be:
This would require Eclairjs-node to change - require("eclairjs") would now return a object with several methods on it (create() and connect/stop/restart/status(uuid)) and then only during create/connect would it return an instance of the Spark api, bound to a specific kernel.
A couple of thoughts on this with the api. We are really talking about Kernel management and SparkContext management. We need to define the SparkContext to Kernel relationship then we can make some decisions there after. Talking about the Kernel to NodeJS developer will get confusing. In our case SparkContext really means a remote kernel with a SparkContext reference. Let's focus this discussion on SparkContext rather than Kernel. SparkContext can take an app name as a parameter. Let's use this as the identifier.
We don't have control over kernel id creation so how do we map kernel id to SparkContext application name?
Some information regarding the notebook server's session api. The session api is meant to be used for session management of a particular notebook. Where notebook is associated with a "path" / file name. If you send a POST to /api/sessions with a notebook path of "foo" and a kernel spec name, a kernel will be created for this session. We could use notebook path as our SparkContext app name. Essentially what we are doing here is treating SparkContext as a notebook session. The issue is the kernel gateway doesn't implement /api/sessions, but we could fix that.
One possible improvement from an API standpoint would be to allow passing in our own identifier during Kernel creation that we could query for later (in connectToKernel calls for example). This would eliminate the need for Toree users to store the kernelid somewhere or have to query the running Kernels. In our case that could now easily be the SparkContext name.
This would have the additional benefit of allowing one to figure out what program a Kernel is executing without having to look into the Kernel's executed code.
One thing I am seeing with the sessions api is that it always creates a python kernel for me.
EclairJS node instances need the ability to control how many kernels are created, which kernel they connect to, and be able to determine if api calls were already executed in a kernel.
Scenarios
Questions
How do we determine if api calls have already been executed in a kernel? We could check to see if a SparkContext has been created?
How do we determine that a kernel id is associated with a particular node app? kernel id's are generated by the kernel gateway. We don't have control over them?