aws / studio-lab-examples

Example notebooks for working with SageMaker Studio Lab. Sign up for an account at the link below!
https://studiolab.sagemaker.aws
Apache License 2.0
624 stars 181 forks source link

Unable to connect to MongoDB #51

Open benstrum opened 2 years ago

benstrum commented 2 years ago

I am trying to connect to MongoDB Atlas. Are certain ports blocked?

Steps to reproduce the behavior:

from pymongo import MongoClient MONGO_URL = "xxx" client = MongoClient(MONGO_URL) client.db.find({})

Results in ServerSelectionTimeoutError on port 27017

EmilyWebber commented 2 years ago

Hi, we're taking a look at this right now.

swagulkarni commented 2 years ago

@benstrum - Couple of questions to help us replicate the problem:

  1. Are you trying to connect to a MongoDB installed elsewhere from the notebook?
  2. Which notebook are you trying to test?
benstrum commented 2 years ago

Hi, I had created my own notebook where I was going to pull a dataset out of one of my existing mongo servers. To reproduce the issue, you can easily create a free mongo db at their site ( https://www.mongodb.com/cloud/atlas/register) and then try to connect. My guess is that port 27017 is blocked on aws' end.

-Ben

On Fri, Jan 21, 2022 at 7:59 PM swagulkarni @.***> wrote:

@benstrum https://github.com/benstrum - Couple of questions:

  1. Are you trying to connect to a MongoDB installed elsewhere from the notebook?
  2. Which notebook are you trying to test?

— Reply to this email directly, view it on GitHub https://github.com/aws/studio-lab-examples/issues/51#issuecomment-1018778469, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEXEG665XA4XVXKXQ6ASJ2LUXGUKTANCNFSM5MGVVI4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

EmilyWebber commented 2 years ago

How are you managing your connection to your Mongo instance? Can you validate that the connection works with the same code outside of SMSL, for example from your laptop?

benstrum commented 2 years ago

Yes I'm able to connect locally and from Google colabs. Have you tried making a connection? As i mentioned, you can create a free account in mongo atlas Also, can you verify that the port is open?

On Thu, Jan 27, 2022, 10:41 PM Emily Webber @.***> wrote:

How are you managing your connection to your Mongo instance? Can you validate that the connection works with the same code outside of SMSL, for example from your laptop?

— Reply to this email directly, view it on GitHub https://github.com/aws/studio-lab-examples/issues/51#issuecomment-1023664179, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEXEG63A4ZZCPKPSZWE2BRDUYG3Z3ANCNFSM5MGVVI4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

darrenkraker commented 2 years ago

This issue also exists for outbound traffic on port 3306. We have a notebook that connects to a mysql RDS instance that is open to the world (academic use with fake data) works great from EC2 or other db tools but the same code when run from a Jupyter Lab notebook fails to connect. Running curl -v telnet://.cluster-xxxxxx.us-west-2.rds.amazonaws.com:3306 times out when run from a terminal session on the Jupyter Lab notebook but not my local instance. Are there any outbound network restrictions? Port 443 seems to work but not 22 or 3306?

darrenkraker commented 2 years ago

So I found a work around if you're willing to use a reverse TCP proxy using NGINX. It looks like the notebooks will support outbound traffic on port 80. I setup a reverse proxy with the following config in nginx.conf

stream {
    server {
        listen     80;
        #TCP traffic will be forwarded to the specified server
        proxy_pass .cluster-xxxxx.us-west-2.rds.amazonaws.com:3306;
    }
}

Then updated my notebook to connect as follows:

try:
    mydb = mysql.connector.connect(
      host="ec2-xx-xx-xx-xx.us-west-2.compute.amazonaws.com", # URL or hostname of the database in Oregon  
      user="dbuser",
      password="<PASSWORD>",
      port="80",
      database='XXX')

It would be great if the Jupyter Studio Lab environment would allow outbound 3306, 27017, etc . . Maybe if we are nice to @EmilyWebber she can help us fix it ;)

EmilyWebber commented 2 years ago

@darrenkraker It never hurts ... haha just kidding! We'll try to get some clarity on which ports are open / closed for SMSL straight away.

benofben commented 2 years ago

We're having the same issue with it blocking Neo4j ports. Apparently 80, 443 and 53 are open. Keep in mind this isn't incoming traffic it's blocking but outgoing. The reason you'd block outgoing traffic is not to prevent attack but because you don't trust your user. While that might make sense in a central IT context, for a notebook environment, it's hard to see why you wouldn't let a user torch their environment if they wanted to. These things are all ephemeral after all...

This issue is going to come up with any database. I would think Amazon would want to make it possible to connect to databases from Studio Lab, otherwise it isn't going to be terribly useful. Some examples:

The default VPC for EC2 also allows outgoing *, so allowing outgoing traffic isn't just industry standard. It's also normal for AWS products. As others have noted, the most obvious competitor to Studio Lab, Colab, allows all outgoing traffic.

I've tried to configure to use those other ports 80, 443 and 53 but because they're less than 1024 they require privilege to run on. That would mean modifying the service for our database to run with privilege which opens more cans of worms. For SaaS products like Neo4j Aura, MongoDB Atlas and Confluent Cloud, modifying the port isn't even going to be an option. So connectivity to many of Amazon's most important partner products from Studio Lab would be impossible.

I saw an NGINX workaround suggested above. I don't really want to pull a brand new component into this system just to avoid a firewall. It's not going to work with clustered deployments of our database either, so that's out.

There was another port forwarding suggestion that is super brittle and going to cause nasty issues in any clustered configuration. It will probably work for single node though.

The solution we're currently going with is to:

  1. Run in colab to connect to the database. Export the data we need and upload it to s3 using boto3.
  2. Then run in SageMaker studio lab pulling from s3 (which operates over port 80).

Code is here.

I can't imagine Amazon wants to encourage users to go to Colab, Visual Studio Code, etc. Can we please get this fixed? It looks like it's been pending since January.

I also note that normal SageMaker notebooks do not have this restriction. So, the "security" argument I've heard for this restriction clearly isn't valid, since the full featured enterprise product does not suffer from this limitation.

jasonbub commented 1 year ago

Hi @EmilyWebber ,

I just wanted to chime in here. It is now close to 1 year since @benofben wrote the excellent note above that describes the issue in detail.

I just spent about 1 hour trying to figure out why a connection that works from other AWS systems and Google Colab doesn't work in Studio Lab. I then luckily stumbled on this thread which saved me from another several hours of searching for a solution. I now know that SageMaker Studio Lab only supports a limited set of ports externally, but what is being done about this?

Any chance we will get access other ports or is Studio Lab not a viable solution for those of us that need to access ports other than 80, 443 and 53?

Thanks in advance for your time!

MicheleMonclova commented 1 year ago

Hi, these ports were purposely closed for security reasons. My best recommendation if you need these ports now is to migrate to SageMaker Studio. It requires an AWS account. There is a free tier that you can take advantage of.

benofben commented 1 year ago

As I analyzed above, the security argument is specious.

Since I wrote the above analysis, we've stopped using SageMaker Studio Lab entirely. With this limitation it's largely useless. It's a shame to see AWS breaking what could be a valuable product. It also shows a lack of customer obsession to ignore feedback like this.