Configure Neptune instance

baskaufs commented 2 years ago

In Neptune, create database. Chose a version not the latest release.
Instance size to large.
Defaults on almost everything, left port at 8182
Changed default on notebook to medium.
Role, appended triplestore1 to name.
tagged owner as DISC
unchecked the "enable deletion protection".

Now went into VPC to create an endpoint for the S3 bucket access.

Service name is com.amazonaws.us-east-1.s3
Chose "gateway" type.
Used default VPC, which is for our account and region (?).

~The notebook we created didn't work. The error message was "Failure reason The Notebook Instance type 'ml.t3.xlarge' is not available in the availability zone 'us-east-1e'. We apologize for the inconvenience. Please try again using subnet in a different availability zone, or try a different instance type." and we got it with every size type up to xlarge (medium, large, xlarge).~

Problem fixed by putting the notebook in the correct availability zone.

baskaufs commented 2 years ago

@CliffordAnderson @awesolek2 It appears that it's not possible for clients outside the VPC to connect to Neptune. See this page about connecting via a Load Balancer and this page about accessing via a Lambda function. Of the two options, the load balancer seems like it would be the simplest since I think you'd have to write your own Lambda.

CliffordAnderson commented 2 years ago

Yes, I agree that using a load balancer seems like the right approach. Thanks for researching these alternatives.

baskaufs commented 2 years ago

Test query for loading data into triplestore using Sagemaker notebook:

%%sparql

LOAD <https://iiif-library-manifests.s3.amazonaws.com/format.nq> INTO GRAPH <http://format>

baskaufs commented 2 years ago

When we moved the Neptune instance to us-east-1, the loading test in the Jupyter notebook didn't work. However, there were several actions we took:

adding a routing table upon creation of the endpoint
updating the access policy using the JSON from the Ohio endpoint.
changing the access policy from custom to full access.
restarting the notebook server.

I'm not sure which if these actions were necessary, but after we did them, I could issue the load command for S3 from the Jupyter notebook successfully as we did with the us-east-2 instance of Neptune.

HeardLibrary / vandycite

Configure Neptune instance #57