Open altruistcoder opened 3 years ago
Hello,
I just wanted to check if there's any update on this issue yet?
@akchinSTC @ptitzler @lresende
Hi @altruistcoder - first, I apologize for the lack of response.
I'd like to get some more information that I'm hoping you can provide:
--debug
option is used to start jupyter lab
?HybridContentsManager
, but also, anything related to the notebook_dir
or root_dir
configurable traits.Hello @kevin-bates, please pardon me for replying late.
Here are the answers to your queries:
/Elyra-Pipelines/examples/pipelines/introduction-to-generic-pipelines/load_data.ipynb
which ideally should have been /opt/app-root/src/Elyra-Pipelines/examples/pipelines/introduction-to-generic-pipelines/load_data.ipynb
.So, basically it always try to search for the files in the /
directory instead of the actual path. The same issue was happening with the jupyterlab-git
extension as well and they needed to make some modifications in their code base to bring the support of HybridContentsManager which can be found here.
Hi @altruistcoder - can you tell me whether or not the path value displayed in the validation error message is the correct path, but it should be prefixed by the root_dir
you reference in your ContentsManager's configuration?
Unfortunately, the solution you link to points at a 45-file refactoring of the repo, and I have no idea what portion of that applies to this root-dir issue.
One thought is that we can't rely on os.path
for node resources and need to go through the configured ContentsManager
. If that's the case, which I suspect it is since s3-hosted files won't be found using os.path
, then this is a widespread change that even affects BYO runtimes. Based on this information, I suspect the path displayed in the error message is perfectly correct but just doesn't exist in the local filesystem.
cc: @akchinSTC @lresende
Hi @altruistcoder - can you tell me whether or not the path value displayed in the validation error message is the correct path, but it should be prefixed by the
root_dir
you reference in your ContentsManager's configuration?
Without actually validating this, I believe the issue is that the backend is accessing files directly instead of using a content service and while this works on a local filesystem, it won't work on a hybrid content manager when the file is only available remotely.
There are probably two issues that we would have to handle here, one is the support for pipeline portability with relative paths and then patching the backend to use content manager to access files.
NOT A CONTRIBUTION
I have confirmed this to be an issue by configuring the s3contents
Content Manager. I confirmed general operation by pointing to an existing bucket and adding the following to my Jupyter configuration (from the referenced repo's README):
from s3contents import S3ContentsManager
# Tell Jupyter to use S3ContentsManager for all storage.
c.ServerApp.contents_manager_class = S3ContentsManager
c.S3ContentsManager.access_key_id = "my-cos-user"
c.S3ContentsManager.secret_access_key = "my-cos-password"
c.S3ContentsManager.endpoint_url = "http://<my-cos-server>:<cos-port>"
c.S3ContentsManager.bucket = "kbates"
c.S3ContentsManager.prefix = "notebooks/test"
Upon startup of jupyter lab
from my home notebooks directory (typically my root-dir), I see the launcher but get a "Directory not found: ''" dialog box that I dismiss. The server log also contains:
[W 2021-11-18 15:33:29.905 ServerApp] 404 GET /api/contents/Users/kbates/notebooks?content=1&1637278409508 (::1): No such entity: [kbates/notebooks/test/Users/kbates/notebooks]
Since my cwd
is /Users/kbates/notebooks
, the s3contents
is looking for the root-dir relative to the configured S3ConentsManager.prefix
within bucket kbates
(e.g., kbates/notebooks/test/Users/kbates/notebooks
), which doesn't exist. This issue can be resolved by starting jupyter with a notebook-dir of /
(e.g., jupyter lab --notebook-dir=/
) which then treats kbates/notebooks/test
within s3 as the root-dir.
I can create notebooks and execute cells as expected.
I also uploaded a working pipeline, its corresponding notebooks, and dependencies into s3 - mirroring my filesystem structure.
We have several issues in which submission is immediately blocked by validation - so I suspect many more to follow. Here's an enumeration so far using a generic pipeline (I'm using a numbered list for easier reference and priorities should not be inferred):
[W 2021-11-18 15:45:47.989 ServerApp] 404 GET /elyra/contents/properties/elyra/my_pipeline/node3.ipynb?1637279147983 (::1): No such file or directory: elyra/my_pipeline/node3.ipynb
To get past the last issue, we'll need to either address the issue or disable that level of validation to determine what other issues exist. I'm also concerned there may be issues with Papermill
since we pass the filepath of the notebook file. I'm not sure if there are other alternatives (like passing the JSON directly). That same kind of issue exists with script execution as well. There are also probable issues related to dependency management.
I suspect platform-specific components may work better.
Needless to say, this is probably a fairly long road, but one we should definitely explore further.
I am working on a use case where I am running Jupyter as a pod in OpenShift, and I want to leverage several JupyterLab extensions such as Elyra.
Currently, I am trying to execute a sample pipeline provided in Elyra examples repo. But, I am facing the below error when I tried to run it in the JupyterLab notebook images having S3 buckets mounted in them (Using same HybridContentsManager):
But, I am able to run the pipeline successfully on JupyterLab notebook images which do not have S3 buckets mounted in them. As far as I am able to figure out with this issue, when I am trying to run the pipeline in an environment where HybridContentsManager is being used, the extension is trying to find files in the
/examples/pipelines/
folder which gives the error as shown in the above screenshot, while it tries to find files in the/opt/app-root/src/examples/pipelines/
folder in the other case.I also faced another similar issue with the
jupyterlab-git
extension and I got a solution for it which can be referenced here.So, can you please help me in resolving this issue as I am not able to use the elyra's pipeline extension anymore.