aws / amazon-neptune-for-graphql

Amazon Neptune utility for GraphQL™ schemas and resolvers
Apache License 2.0
16 stars 1 forks source link

Getting Neptune schema fails #14

Open hpiili opened 5 months ago

hpiili commented 5 months ago

neptune-for-graphql --input-graphdb-schema-neptune-endpoint db-neptune-my-instance-1-read-replica.xxxx.eu-west-1.neptune.amazonaws.com:8182 --output-aws-pipeline-cdk --output-aws-pipeline-cdk-name MY --output-resolver-query-sdk --output-aws-pipeline-cdk-region eu-west-1

Fetching the schema is causing 98% CPU load to reader or writer and the schema fetch starts failing with errors "Http query request failed: Request failed with status code 500 Trying with the AWS SDK"

and finally "SDK query request failed: Request rejected because there are already too many concurrent requests being processed."

log.txt

My schema has 23 node types and 36 edge types.

How can I get the schema extraction to pass?

triggan commented 5 months ago

Hi @hpiili - what version of Neptune are you running on your cluster? This utility will use the Statistics Summary to infer schema if using engine version 1.2.1.0 or newer. If you're using an older version, it will run a set of queries (that could be performance intensive) to fetch the full list of nodes, edges, and property keys within your graph.

hpiili commented 5 months ago

I am using 1.3.0.0 Neptune. In the audit logs I can see something like what you described - a set of queries to find combinations of relations between nodes etc. auditlogs.zip

Cole-Greer commented 5 months ago

Hi @hpiili I have identified the portion of the code responsible for overwhelming your instance with queries and am working on a solution. Would you mind sharing the instance size you are currently using?

hpiili commented 5 months ago

I am using db.x2g.xlarge - writer and one reader configuration in the cluster

hpiili commented 5 months ago

Hi @Cole-Greer I tested your PR version from "updateGetEdgesDirections" branch. Not a complete success yet. Still fails with errors "Http query request failed: Request failed with status code 500 Trying with the AWS SDK"

Attached the execution log from console and the Neptune auditlogs auditlogs_2024_04_04.zip

Cole-Greer commented 5 months ago

Hi @hpiili, that's interesting. That PR changes the way that edge directions are queried such that it will run 1 larger query per edge type, to find all source and destination node types for that edge.

I setup a test graph with a similar number of edge and node types as yours, and in my tests the updated queries performed much better than the old ones. My test graph likely has way fewer edges than yours. I see in the slow query log that you many of these queries are running for almost exactly 2 minutes. Notably, the default neptune query timeout is 2 min.

Could I ask roughly how many edges you have for some of these largest edge types so I can better replicate your issue? Also if you are willing to try again, I suspect that raising the neptune query timeout (in the parameter group) will allow these queries to complete.

hpiili commented 5 months ago

Two of the biggest amounts of nodes are DeliveredPart (22035281 vertices) and AssetAssembly (3809142 vertices) from AssetAssembly nodes we have edge to DeliveredParts (22035281 edges) from DeliveredParts we have ~5 edges out of each

After increasing the query timeout to 10x, the schema extraction goes further. Still two 500 errors.

log_2024_04_05.txt

At the end also creating CDK fails

The command that I executed as neptune-for-graphql --input-graphdb-schema-neptune-endpoint db-neptune-pelm-lcd-dev-instance-1-read-replica.c2hkwv1gpquj.eu-west-1.neptune.amazonaws.com:8182 --output-aws-pipeline-cdk --output-aws-pipeline-cdk-name LCD --output-resolver-query-sdk --output-aws-pipeline-cdk-region eu-west-1 2>&1 | tee log_2024_04_05.txt

hpiili commented 5 months ago

I did not find a very good way of listing the edges. I tried to use MATCH (o:DeliveredPart)-[r]->(n) with distinct type(r) as cr,r return count(r), collect(distinct type(r))

but that fails first with query timeout and then with out of memory. Even the db.r5.12xlarge reader was not able to finish this query.

Any better way of finding the amounts of edges that you asked for?

hpiili commented 5 months ago

I deleted now most of my data from the database in order to be able to run this command. Now I am able to create the schema for limited scope.

Second topic comes from me running this against read replica.

The code does not contain proper try catch and error handling code based on my limited coding knowledge. For example the DescribeDBClustersCommand failed because of running against read replica, but the script is not reporting anything else but fail. When I added the catch and error output, the cause of my mistake came bit more obvious.

Cole-Greer commented 5 months ago

I'm glad you were able to complete the setup on a smaller dataset. I will work on additional query modifications to improve schema fetching for your original size of graph. I will additionally review the error handling to ensure error messages are being surfaced effectively.

Cole-Greer commented 4 months ago

Hi @hpiili, I'm sorry that I have been unable to return to this issue for the last few weeks. I expect to have time to continue investigating this in mid-May.