arangodb / arangodb-java-driver

The official ArangoDB Java driver.
Apache License 2.0
200 stars 93 forks source link

Slow performance for large datasets on 7.5.1 (via Spring Data 4.1.0) #541

Closed mdmm13 closed 4 months ago

mdmm13 commented 4 months ago

Situation: we're got a dev server (4 cores, 16GB memory) that's running a single-instance ArangoDB (3.11.7) and a Spring Boot application that queries ArangoDB via Spring Data 4.1.0. We're posting here instead of in Spring Data as we use the Spring template that passes through to the Java driver directly (see below).

Complication: we've got a query that returns ~6 MB in JSON. That exact query in the admin interface returns in 0.5s. The same query via the Java driver below takes 14s. Regardless of admin interface/ Spring, the CPU/memory barley touches 15% each, same in the Arango dashboard/ metrics, so it's not a spec issue.

watch.start("aql");
ArangoCursor<FindAll> cursor = ops.query(query, bindVars, options, FindAll.class);
watch.stop();

(Note: FindAll is a POJO that includes Arango-annotated classes)

Questions:

  1. How can we debug this properly?
  2. How can we increase parsing performance? 550 lines should not take 13 seconds, regardless of size.

UPDATE:

  1. Setting batchSize to 1 and 1000 - this has shown that the query itself returns in the same time as in the admin console (0.6) for batchSize=1, shifting the 13-second delay to the cursor.asListRemaining() method.
  2. Setting 'RawJson' as the return type - 0.8s total (of which 0.6s is the query), so the performance hit is in the conversion, though it's hard to imagine 550 rows (~6 MB) taking 14 seconds.
  3. Created duplicates of all Arango-annotated classes without annotations - 14s down to 2s.

==> does anyone have input on how to improve the deserialization to the original annotated classes?

rashtao commented 4 months ago

This could happen if FindAll entity has fields linking to other documents (or edges), i.e. fields annotated with @Ref, @From, @To, @Relations. In such case the linked objects would be fetched eagerly. If this is the case, setting the annotation parameter lazy = true would load them lazily.

mdmm13 commented 4 months ago

Thank you @rashtao - interesting that it'd eagerly load on deserialization instead of first actual use. Is there a way we can set everything as lazy by default?

rashtao commented 4 months ago

Currently the default is eager and there is no way to change it globally, so you need to set lazy = true for each usage of the annotations above.

mdmm13 commented 4 months ago

Understood, thank you.

Would be a feature request as a general driver option going forward, because it affects read performance heavily.