[Bug] 413 Payload Too Large

guyelia commented 1 month ago

Description

When syncing a AWS Neptune instance with a pretty small dataset, I'm getting the following error in the developer console: POST https://PROXY_PUB_IP/gremlin 413 (Payload Too Large) looks like it is coming from the Express: X-Powered-By: Express

the requested payload is 369KB when I'm pretty sure Express's default limit is 1mb

Environment

OS: container on ECS
Browser: ARC
Graph Explorer Version: 1.7.0
Graph Database & Version: Amazon Neptune 1.3.0.0

[!IMPORTANT] If you are interested in working on this issue or have submitted a pull request, please leave a comment.

[!TIP] Please use a 👍 reaction to provide a +1/vote.

This helps the community and maintainers prioritize this request.

kmcginnes commented 1 month ago

Thanks for the submission @guyelia.

I'll dig around a bit to see if this is a known issue.

I'd like to get a bit more context from you about the issue.

How many node types do you have (i.e. labels)?
Do you have node or edge types with a lot of attributes (i.e. 10 or more)?

Also, can you post the Gremlin query that caused the error? You can get that by:

Open the developer console in the web browser
Go to the network tab
Click the "synchronize database" button in the app UI
Wait for the request to fail
Select the failed request
Select the "Payload" tab
Right click on the query and select "copy value"
Paste the value in a code block here on the GitHub issue

IMPORTANT: Don't forget to scrub any private info out of the query before posting it.

guyelia commented 1 month ago

Thanks for the submission @guyelia.

I'll dig around a bit to see if this is a known issue.

I'd like to get a bit more context from you about the issue.

How many node types do you have (i.e. labels)?

Do you have node or edge types with a lot of attributes (i.e. 10 or more)?

Also, can you post the Gremlin query that caused the error? You can get that by:

Open the developer console in the web browser

Go to the network tab

Click the "synchronize database" button in the app UI

Wait for the request to fail

Select the failed request

Select the "Payload" tab

Right click on the query and select "copy value"

Paste the value in a code block here on the GitHub issue

IMPORTANT: Don't forget to scrub any private info out of the query before posting it.

Hey @kmcginnes , thanks for the replay, and sorry for the delay.

I've used gremlin-console to answer your questions, please LMK if i missed something:

How many node types do you have (i.e. labels)? I've used g.V().label().count() to count the labels and i have - 4314
Do you have node or edge types with a lot of attributes (i.e. 10 or more)? - I've used g.V().group().by(label).by(properties().key().dedup().count()).toList() to export all properties and the max are 4.

regarding the Gremlin query itself, it gonna be problematic to post it because it contains sensitive data mostly, but it is a huge command built from "g.V().project(ALL_NODES).by(V().hasLabel(NODE).limit(1)).by(V().hasLabel(ANOTHER_NODE).limit(1))...

it looks like a nodejs express limitation of 100k per request, any objection to increasing it to something bigger? (1MB?)

kmcginnes commented 1 month ago

@guyelia Thank you. That is perfect!

If you need a fix quickly, then definitely fork this project and increase the limit.

We are focusing on db query performance now and I'm going to consider the increase as a potential solution. But this is a bandaid fix and will just kick the can down the road to some future user who needs the request size to be even bigger.

I would really love to find a better way to construct the query so that it isn't so large. Or perform some batching for larger databases. So I don't want to lean on the request size increase if I can help it.

guyelia commented 1 month ago

@guyelia Thank you. That is perfect!

If you need a fix quickly, then definitely fork this project and increase the limit.

We are focusing on db query performance now and I'm going to consider the increase as a potential solution. But this is a bandaid fix and will just kick the can down the road to some future user who needs the request size to be even bigger.

I would really love to find a better way to construct the query so that it isn't so large. Or perform some batching for larger databases. So I don't want to lean on the request size increase if I can help it.

sure thing, thanks! so, just for FYI and if anyone else encountered the same issue - I've built the graph-explorer locally with two additional lines to packages/graph-explorer-proxy-server/node-server.js:

app.use(bodyParser.json({ limit: '50mb' })); // Increase the payload size limit
app.use(bodyParser.urlencoded({ limit: '50mb', extended: true })); // Increase the payload size limit

and problem solved :)

aws / graph-explorer

[Bug] 413 Payload Too Large #410

Description

Environment