aws / graph-notebook

Library extending Jupyter notebooks to integrate with Apache TinkerPop, openCypher, and RDF SPARQL.
https://github.com/aws/graph-notebook
Apache License 2.0
730 stars 166 forks source link

[BUG] When I start to run Neptune DB loader in incremental manner, I can't do it #710

Open ImBharathwaj opened 1 week ago

ImBharathwaj commented 1 week ago

Community Note

Describe the bug When I start to run Neptune DB loader in incremental manner, I am getting errors. When the queue_request = True I am getting this as error

{'errorMessage': '500: {"code":"InternalFailureException","requestId":"5cc93385-4d90-e1e8-6b75-f325da88190d","detailedMessage":"Failed to start load from the source s3://dev-adp-data-lake/graph/tenant=fancapital-alexalbon/auth0/","message":"Failed to start load from the source s3://dev-adp-data-lake/graph/tenant=dhanasekar/auth0/"}', 'stackTrace': 'NoneType: None\n'}

When the queue_request = False I am getting this as error {'errorMessage': '400: {"code":"BadRequestException","requestId":"4cc9334e-e888-38e8-84da-7f993388e5e1","detailedMessage":"Failed to start new load for the source s3://dev-adp-data-lake/graph/tenant=dhanasekar/shopify/. Max concurrent load limit breached. Limit is 1","message":"Failed to start new load for the source s3://dev-adp-data-lake/graph/tenant=dhanasekar/shopify/. Max concurrent load limit breached. Limit is 1"}', 'stackTrace': 'NoneType: None\n'}

Can anyone tell if there any way to run Neptune in incremental manner

Expected behavior Neptune loader has to run in incremental manner without any issue

triggan commented 1 week ago

Hi @ImBharathwaj - it appears that your issues may not be fully related to using the notebooks, so I would suggest maybe moving any future questions that are not specifically related to Graph Notebook over to https://repost.aws.

The Neptune Bulk Loader is capable of running one load request at a time. It cannot run bulk load jobs in parallel. For each load job, data is fetched from S3, batched, and written in parallel (based on the parallelism parameter) to Neptune. If you want to submit more than one job at a time, then you can use the queue_request parameter (set to True) and up to 64 jobs an be submitted and queued for execution.

Have you also configured the required VPC Endpoints and added the proper role to your Neptune cluster to perform the bulk load? Details here: https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html

ImBharathwaj commented 1 week ago

Hey @triggan Our code is working as expected in full refresh. When we are trying to run in incremental manner, This problem raises We tried a lot but still we can't move forward.

triggan commented 1 week ago

Hi @ImBharathwaj - There's not enough information here to provide prescriptive guidance on what you can do to handle the errors you're seeing above. A 500 error code with an InternalFailureException typically indicates you're trying to do something that the engine is not expecting. I would have to inspect your code to see what logic you're using to kick off the successive bulk load jobs. Either that, or I would need an AWS Support Case opened so that we could inspect your cluster to see what is causing the 500 error. Whatever additional information you can provide would be helpful.