Provide metadata service as separate server extension

elyra-ai / elyra

Elyra extends JupyterLab with an AI centric approach.

https://elyra.readthedocs.io/en/stable/

Apache License 2.0

1.86k stars 344 forks source link

Provide metadata service as separate server extension #701

Open Zsailer opened 4 years ago

Zsailer commented 4 years ago

I'm loving the metadata service inside of elyra!

I'd really like to tap into this extension without installing all of elyra's dependencies. Right now, installing the Python package leads to the installation of a few other server extensions, such as jupyterlab-git and nbdime, because they are required in elyra's setup.py. However, it would be nice to install this extension without these extra dependencies.

One option is to create a new repo and provide the metadata service as a separate PyPI package; the second option is to make these other dependencies optional (though this is likely a problem for the elyra distribution).

Let me know your thoughts. I'd be happy to help move this part of the code out into a separate repo if you think this make sense. Thanks!

cc @kevin-bates

kevin-bates commented 4 years ago

Hi Zach. I'm curious how you envision using this outside of Elyra?

There are some behaviors with respect to the file hierarchy that are not intuitive. For example, the precedence for searching for metadata instances is:

User's home directory
System shared directories (/usr/local/...)
Python sys.prefix

while the default Jupyter hierarchy is: 1, 3, 2.

We (Elyra) treat the third location as "factory data" in which the installation of Elyra will place some "system owned" instances. Updates via the REST API only occur against location 1 (user's home) as there's no way to indicate where the location should be. As a result, the other locations will appear in retrievals, only in the event there isn't the same-named instance in a prior directory. Since there's been discussion of changing the Jupyter path hierarchy, I'm beginning to think "factory data" should be included in the package installation location and require one-off logic for retrievals [*].

For example, if I go to 'update' one of the factory items located in sys.prefix, the result of that update will be placed into the hierarchy located within my home directory. All subsequent retrievals will pull that instance, essentially masking out the instance in sys.prefix.

[*]: This model will also need to be revisited when adding support for other forms of persistence (namely NoSQL DBs). This way, the same one-off logic for retrievals of factory data could be utilized irrespective of persistence forms. Hmm, but if we split out metadata as it's own thing, the notion of "factory data" will need to be revisited entirely. (Which isn't such a bad idea IMHO. :smile: )

Zsailer commented 4 years ago

Hi Zach. I'm curious how you envision using this outside of Elyra?

My immediate reason is that our "code-snippets" team (@jupytercalpoly) is building a stand-alone code-snippets extension on top of Elyra's metadata service backend (i.e. using the metadata service for validating+storing snippets). Their plan is to work in a sandbox, outside of Elyra, for a little while and contribute upstream later when they make some progress.

The issue is that they can't install the metadata service without getting all of Elyra's dependencies. Further, Elyra doesn't include a code-snippets directory/namespace in the user's jupyter paths by default. Instead this directory has to be manually created by the user. To circumvent this, we had to create a middle-man python package that creates this directory when their extension is installed.

I see what you're saying about jupyter paths, though. It seems to me that you might want to separate the logic for where the extension's data files get placed from the extension's source. In this case, Elyra could become a mono-package (is that a thing? ... I'm thinking like a monorepo) with its own setup.py that depends on the metadata service package, but handles the placement of its data files, putting them in the custom "system owned" locations. Other Jupyter deployments/distributions/suites can the ship this metadata service separate from Elyra and decide where to place these data files in their context.

cc'ing @kpinnipa and @jahn96, our interns working on this project, to watch this thread.

kevin-bates commented 4 years ago

Elyra doesn't include a code-snippets directory/namespace in the user's jupyter paths by default. Instead this directory has to be manually created by the user. To circumvent this, we had to create a middle-man python package that creates this directory when their extension is installed.

Hmm - No Elyra namespaces are created until first use. Namespaces are known to Elyra, but don't exist until metadata is installed via the CLI tool, or created via POST to /elyra/metadata/code-snippets. Here's the CLI tool example (directory /Users/kbates/Library/Jupyter/metadata did not exist prior to this call):

~/Library/Jupyter/metadata$ elyra-metadata install code-snippets --schema_name=code-snippet --name=foo --display_name=Foo --language=Python --code="['import os']"
Metadata instance 'foo' for schema 'code-snippet' has been written to: /Users/kbates/Library/Jupyter/metadata/code-snippets/foo.json

Their plan is to work in a sandbox, outside of Elyra, for a little while and contribute upstream later when they make some progress.

The issue is that they can't install the metadata service without getting all of Elyra's dependencies.

Why not just work from a fork? Seems like it would be easier to contribute and adds some time for splitting out the MD service.

One of the things I wanted to do is split the logic as you describe - the place holders are already there - but I just haven't been diligent about honoring that separation. Most of the logic should move into the base class and I had made a mental note to address that (but mental notes might as well be written on an etch-a-sketch these days :smile: ). I'm also unsure about the current class hierarchy - which I suspect is a source of that neglect.

lresende commented 4 years ago

Hi @Zsailer,

I'd really like to tap into this extension without installing all of elyra's dependencies. Right now, installing the Python package leads to the installation of a few other server extensions, such as jupyterlab-git and nbdime, because they are required in elyra's setup.py. However, it would be nice to install this extension without these extra dependencies.

In development mode, make install-server on the Elyra repo should do what you need.

In a release, we want to simplify the user experience for the user, and thus we drop these extensions into the JupyterLab folder which makes the pip install elyra && jupyter lab build the only thing the user needs to do and that's very useful.

Separating the modules is definitely something we might be open to, but as we are still having a lot of changes on the metadata services, having a separate module might increase the work necessary to make quick progress, as changes on the new module need to be released to be used with the main Elyra repo.

One thing I was thinking about, at least for now, is to customize the setup.py to allow something like ELYRA_DISABLE_LAB_EXTENSIONS=true pip install elyra which then would only install the backend.

My immediate reason is that our "code-snippets" team (@jupytercalpoly) is building a stand-alone code-snippets extension on top of Elyra's metadata service backend (i.e. using the metadata service for validating+storing snippets). Their plan is to work in a sandbox, outside of Elyra, for a little while and contribute upstream later when they make some progress.

Don't be shy :) we can create a branch for the interns so they can work in isolation and in a more stable environment, which will give them some flexibility to experiment but also provide the real open source experience working with the community.

Zsailer commented 4 years ago

don't exist until metadata is installed via the CLI tool

Gotcha. I totally missed the CLI tool 🤦 😄.

Our interns were running into this basic issue. They installed Elyra (from source), clicked on the code-snippets icon, and received an error: the code-snippets namespace does not exist—i.e. making a GET /elyra/metadata/code-snippets request before you create the code-snippets namespace throws an error.

It makes sense why this is happening, but it also complicates the setup/install instructions for extensions like code-snippets. I was thinking that each frontend extension should exist as separate Python + JS packages that register their schemas with the metadata service and initialize their own namespace.

Zsailer commented 4 years ago

Why not just work from a fork? Seems like it would be easier to contribute and adds some time for splitting out the MD service.

Don't be shy :) we can create a branch for the interns so they can work in isolation and in a more stable environment, which will give them some flexibility to experiment but also provide the real open source experience working with the community.

😄 absolutely agree. That's the goal. Right now, they're just doing some initial exploration, learning to write jupyterlab extensions from tutorials, and trying out Elyra at the same time. Once they get more familiar with jupyterlab and begin to work on the extension for real, we'll switch to a fork of Elyra 👍

lresende commented 4 years ago

Our interns were running into this basic issue. They installed Elyra (from source), clicked on the code-snippets icon, and received an error: the code-snippets namespace does not exist—i.e. making a GET /elyra/metadata/code-snippets request before you create the code-snippets namespace throws an error.

It would be really great to have Elyra issues created for these things, particularly if they are running from master. This is probably a side effect of us having a sort of already configured environment and just updating the binaries.

Even better if they help with fixes, but not required.

lresende commented 4 years ago

There isn't a complete solution for this yet, but note that we now publish a elyra-server package to python that has only the elyra backend components. This allows you to deploy individual Elyra extensions to jupyterlab.

kevin-bates commented 4 years ago

I think there are a number of additional functional items for making the metadata service a separate extension as well:

Namespace objects that hold schemas and define storage characteristics
Ability to bring your own namespaces and schemas
Proper definitions for specifying "scopes" or "storage areas" of instances (e.g., user, shared, system) or simplification of the current hierarchy
Addition of some meta-behaviors (e.g., "system-owned", "default-instance")