Adobe-Consulting-Services / acs-aem-commons

http://adobe-consulting-services.github.io/acs-aem-commons/
Apache License 2.0
454 stars 600 forks source link

MCP Report performance slow #3257

Open ghenzler opened 9 months ago

ghenzler commented 9 months ago

Required Information

Expected Behavior

storeReport() for large reports containing ~80,000 rows should execute in a reasonable timeframe (seconds or at most minutes)

Actual Behavior

Saving a large report takes 90min locally

Steps to Reproduce

Create an MCP instance with ~80,000 rows and wait ;-)


The reason for the slow performance is that https://github.com/Adobe-Consulting-Services/acs-aem-commons/blob/2daabf3d98c9c6c7bcc6b6ffbca5df22ae4d6407/bundle/src/main/java/com/adobe/acs/commons/mcp/model/GenericReport.java#L88 has true as last argument (autoCommit).

So what I did to solve the problem: I created a local version of com.adobe.acs.commons.mcp.model.GenericReport.persist(ResourceResolver, String) and call it from com.adobe.acs.commons.mcp.ProcessDefinition.storeReport(ProcessInstance, ResourceResolver).

Als creating the Xls on the fly can sometimes time out in the cloud service, so I'm currently creating it at /var/acs-commons/mcp/instances/<id>/jcr:content/mcp-report.xlsx and I have a filter in place to just redirect to that location instead of generating it on the fly.

So at the very minimum ResourceUtil.getOrCreateResource() should not be called with true (brings down the execution time from 90min to 1min). I'm happy to create a PR with this change and the XLS-generation done at the end of the report if desired.

kwin commented 9 months ago

PRs are always appreciated in open source!

ghenzler commented 9 months ago

changing from true to false is easy - the question is more if everybody is happy with generating the xls file directly at the end so it does not have to be regenerated on every xlsx-download? (maybe there was reasons it was done the way it is today, before I create a PR for that I want to hear that the "MCP maintainers" are fine with it...)

YegorKozlov commented 9 months ago

MCP uses Apache POI to build xlsx which is a memory hog. Generating a report with 80K rows can take quite a bit of time and memory, and even potentially result in OOM). Maybe we should make the output format configurable and allow user to choose between xlsx and csv. So there are two potential improvements:

true => false will reduce the execution time xlsx => csv will reduce the download time

ghenzler commented 9 months ago

I think it's ok to use xlsx, it's just more convenient to open & distribute (as the reports are often sent to business people from my experience). I think there is little reason to not change from true to false to speed up the creation of the JCR nodes significantly.

My question is mostly if somebody objects to create the xlsx report immediately along with the JCR nodes, so on download the existing binary can just be quickly spooled out.

YegorKozlov commented 9 months ago

My question is mostly if somebody objects to create the xlsx report immediately along with the JCR nodes, so on download the existing binary can just be quickly spooled out.

it should be perfectly fine to create the report as a step in the MCP job. Ideally it would be backwards compatible, i.e. if the report is pre-built (new mode) then download it, otherwise create it on the fly (current behavior).