aws / graph-notebook

Library extending Jupyter notebooks to integrate with Apache TinkerPop, openCypher, and RDF SPARQL.
https://github.com/aws/graph-notebook
Apache License 2.0
735 stars 168 forks source link

[BUG] Docker run on mac cannot access localhost:8889 #491

Closed expe-elenigen closed 11 months ago

expe-elenigen commented 1 year ago

Community Note

Describe the bug If we try to use this project with docker: docker build -t aws/graph-notebook-conf . At the runtime, there's a minor issue :

[W 2023-05-18 01:10:43.536 ServerApp] No web browser found: Error('could not locate runnable browser'). Which can be addressed easily in: /docker/service.sh

nohup jupyter notebook --ip='*' --port ${NOTEBOOK_PORT} "${WORKING_DIR}/notebooks" --allow-root > jupyterserver.log &
nohup jupyter lab --ip='*' --port ${LAB_PORT} "${WORKING_DIR}/notebooks" --allow-root > jupyterlab.log &

We can fix it by adding the flag --no-browser

nohup jupyter notebook --ip='*' --port ${NOTEBOOK_PORT} "${WORKING_DIR}/notebooks" --no-browser --allow-root > jupyterserver.log &
nohup jupyter lab --ip='*' --port ${LAB_PORT} "${WORKING_DIR}/notebooks" --no-browser --allow-root > jupyterlab.log &

When the docker container is running, I cannot connect to it with my browser, I noticed that both lines previously mentioned in the file /docker/service.sh have the configuration --ip='*' which I believe have an impact on those logs at the runtime:

[C 01:10:43.401 NotebookApp] You must use Jupyter Server v1 to load JupyterLab as notebook extension. You have v2.5.0 installed. You can fix this by executing: pip install -U "jupyter-server<2.0.0" [I 01:10:43.405 NotebookApp] Serving notebooks from local directory: /root/notebooks [I 01:10:43.405 NotebookApp] Jupyter Notebook 6.4.12 is running at: [I 01:10:43.405 NotebookApp] http://docker-desktop:8888/

and

[I 2023-05-18 01:10:43.519 ServerApp] jupyterlab | extension was successfully loaded. [I 2023-05-18 01:10:43.526 ServerApp] nbclassic | extension was successfully loaded. [I 2023-05-18 01:10:43.528 ServerApp] Serving notebooks from local directory: /root/notebooks [I 2023-05-18 01:10:43.528 ServerApp] Jupyter Server 2.5.0 is running at: [I 2023-05-18 01:10:43.528 ServerApp] http://localhost:8889/lab [I 2023-05-18 01:10:43.528 ServerApp] http://127.0.0.1:8889/lab

I presume the issue here is the hostname http://docker-desktop:8888/ while we see that for the lab part it's using correctly hostname in http://localhost:8889/lab. From what I understand, when we package Junyper in docker, it's a common practice to have:

I tried to change the configuration to replacing --ip='*' by --ip 0.0.0.0 but it didn't worked, then I tried to modify /docker/service.sh by adding: echo "c.ServerApp.ip = \"0.0.0.0\"" >> ~/.jupyter/jupyter_notebook_config.py like it's the case here: https://github.com/jupyter/docker-stacks/blob/b378681adad0506b3613cde9d8a35c3e246dfe71/base-notebook/jupyter_server_config.py#L11-L12 This change didn't fix the issue, so I'm considering to use jupyter/minimal-notebook as a workaround.

To Reproduce Steps to reproduce the behavior:

  1. Clone the current repository
  2. docker build -t aws/graph-notebook-conf .
  3. docker run --network="host" -p 8889:8889 -p 8888:8888 -v ~/dev/graph-notebook/out:/working -e GRAPH_NOTEBOOK_HOST=xyz.us-west-2.neptune.amazonaws.com -e AWS_REGION=us-west-2 -e GRAPH_NOTEBOOK_SSL=False aws/graph-notebook
  4. Opening Chrome with http://localhost:8889/lab

If applicable, add screenshots to help explain your problem.

Expected behavior JypiterLab should accessible.

michaelnchin commented 1 year ago

Thank you for the bug report and analysis, @expe-elenigen ! We are looking into a fix for this.

cubeddu commented 1 year ago

@expe-elenigen While trying to reproduce this issue, a warning in the log message's first line WARNING: Published ports are discarded when using host network mode

remove --network="host" it will only work on Linux.

docker run -p 8889:8889 -p 8888:8888 -v ~/dev/graph-notebook/out:/working -e GRAPH_NOTEBOOK_HOST=xyz.us-west-2.neptune.amazonaws.com -e AWS_REGION=us-west-2 -e GRAPH_NOTEBOOK_SSL=False aws/graph-notebook-conf

will resolve your issue furthermore do a docker ps and make sure you can see the port for aws/graph-notebook-conf

expe-elenigen commented 1 year ago

In the end, in my case, I had to switch GRAPH_NOTEBOOK_SSL to true and as you mentioned, I removed the --network="host" part, I wonder if something change in the docker image or what, but now it's working:

docker run  -p 8889:8889 -p 8888:8888 \
  -v ~/dev/graph-notebook/out:/working \
  -e GRAPH_NOTEBOOK_HOST=..xyz.us-west-2.neptune.amazonaws.com \
  -e AWS_REGION=us-west-2 \
  -e GRAPH_NOTEBOOK_SSL=true \
  aws/graph-notebook-conf
michaelnchin commented 11 months ago

Closing as the issue appears to be resolved - please feel free to re-open if there are any further questions or concerns.

akoskm commented 2 months ago

When I spin up the image given the advice here, I get to a login screen, but I'm not sure how to continue from here?

Screenshot 2024-08-26 at 13 53 20

triggan commented 2 months ago

I believe you may have the incorrect port. Notebook server is running on 8888. Lab is running on 8889.

akoskm commented 2 months ago

I believe you may have the incorrect port. Notebook server is running on 8888. Lab is running on 8889.

Indeed, I attached the wrong screenshot. This is what I see on 8889. To add some more context here's how I'm starting the image:

docker run -p 8889:8889 -p 8888:8888 \
  -v ~/dev/graph-notebook/out:/working \
  -e GRAPH_NOTEBOOK_HOST=zzz.cluster-ro-yyy.us-east-1.neptune.amazonaws.com \
  -e AWS_ACCESS_KEY_ID \
  -e AWS_SECRET_ACCESS_KEY \
  -e AWS_SESSION_TOKEN \
  -e AWS_REGION="us-east-1" \
  -e GRAPH_NOTEBOOK_SSL=False \
  graph-notebook

And here's what I get on each port when I navigate to localhost:8888/lab or localhost:8889/lab

Screenshot 2024-08-26 at 16 58 49

triggan commented 2 months ago

If you're going to access on port 8888, just leave off the /lab. Port 8888 is where Jupyter Notebook (classic) is running.

In both cases (using classic on 8888 or lab on 8889), if you don't set NOTEBOOK_PASSWORD as part of the environment variables in the docker run command, then the default password is used: https://github.com/aws/graph-notebook/blob/2e3cefd30db0fa22a4bb702ee04167c8fad4e2fa/Dockerfile#L26

akoskm commented 2 months ago

Thanks a lot @triggan! Can you confirm that I can pass a Neptune Analytics Graph endpoint (g-xyz.us-east-1.neptune-graph.amazonaws.com) to the notebook and work with that as well?

triggan commented 2 months ago

Yes, just ensure you're setting the analytics graph endpoint using the GRAPH_NOTEBOOK_HOST environment variable or by modifying the %graph_notebook_config from a notebook after the container launches.

akoskm commented 2 months ago

Thanks @triggan! I started the notebook inside an ec2 with access to a neptune analytics graph using the private endpoint. I can run queries inside the ec2 with:

time aws neptune-graph execute-query --graph-identifier g-ggbxhnqxxxx --region us-east-1 --query-string ...

and I'm getting some results.

However, when I set the private endpoint in the notebook, I'm getting this error:

{'error': ConnectTimeout(MaxRetryError("HTTPConnectionPool(host='g-ggbxhnqxxxx.us-east-1.neptune-graph.amazonaws.com', port=8182): Max retries exceeded with url: /status (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f25b102e4a0>, 'Connection to g-ggbxhnqxxxx.us-east-1.neptune-graph.amazonaws.com timed out. (connect timeout=None)'))"))}

Here's my notebook config:

{
    "host": "g-ggbxhnqxxxx.us-east-1.neptune-graph.amazonaws.com",
    "neptune_service": "neptune-db",
    "port": 8182,
    "proxy_host": "g-ggbxhnqxxxx.us-east-1.neptune-graph.amazonaws.com",
    "proxy_port": 8182,
    "auth_mode": "IAM",
    "load_from_s3_arn": "",
    "ssl": true, // I tried turning this
    "ssl_verify": true, // and this on and off
    "aws_region": "us-east-1",
    "sparql": {
        "path": "sparql"
    },
    "gremlin": {
        "traversal_source": "g",
        "username": "",
        "password": "",
        "message_serializer": "graphsonv3",
        "connection_protocol": "http"
    },
    "neo4j": {
        "username": "neo4j",
        "password": "password",
        "auth": true,
        "database": null
    }
}
triggan commented 2 months ago

neptune_service needs to be changed to neptune-graph instead of neptune-db.

Also, the CLI command is connecting to your graph on port 443. Whereas the notebook is connecting on port 8182. Both ports will work (8182 was the default port from Neptune DB, so we brought that over with Neptune Analytics). But, you'll need to ensure that the security group attached to your Private Graph Endpoint is allowing traffic on 8182, or change that to use 443 instead.

You also don't need to specify the proxy host. That is used for connecting from outside of a VPC to a Neptune Database cluster (because Neptune DB clusters do not support public connectivity, where Neptune Analytics Graphs do). So leave that field just as "".

akoskm commented 2 months ago

Thanks @triggan now I can run %status (so I assume Jupyter can connect to Neptune Analytics), but for example %opencypher_status times out:

image

The security group attached to your Private Graph Endpoint is allowing traffic on 8182 and on 443.

triggan commented 2 months ago

Perhaps check the DNS settings within the container to ensure it's using the same DNS settings as the EC2 instance?

If you run !nslookup <graph_endpoint> in a cell, it should resolve to the IP address used by your Private Graph Endpoint within your VPC.

Seems odd that an execute-query call would succeed from your EC2 instance but is not connecting from the notebook.

FWIW, the get-graph API call made by the %status magic is accessing a control plane API (via the https://neptune-graph.<region>.amazonaws.com/graphs/<graph_identifier> endpoint) where as the execute-query API (or the list-queries API from the %opencypher_status magic) are both accessing the graph-specific endpoints (data plane). The former would go across the Internet (IGW) whereas the latter needs to go through the VPC Endpoint (Private Graph Endpoint).