IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 486 forks source link

Dataset Edit Performance Improvements #10890

Open qqmyers opened 9 hours ago

qqmyers commented 9 hours ago

What this PR does / why we need it: These PR includes multiple changes to the UpdateDatasetVersionCommand to improve the performance/scalability when editing dataset with large numbers of files. Key changes include:

Which issue(s) this PR closes:

Closes #10138

Special notes for your reviewer: In my testing on a dataset with 10K files, the time required for the UpdateDatasetVersionCommand in the DatasetPage.save() method to complete (as measured by logging in the save method) when a one char change to the description was made was averaging ~30 seconds. With all the changes in the PR, it now takes ~12-13 seconds. In general, verifying the impact of individual changes is hard:

That said, I would estimate that the first two changes contribute ~4 second reductions each (the feature flag would save 12 seconds, but the differencing PR saves ~ 8 seconds there). The

Suggestions on how to test this: All the automated tests should pass, any/all variants of making changes to a dataset should work as before, there should be no changes w.r.t. the db-level updates except for the change to not update datafile lastmodified dates. Performance should be improved overall and scaling should be improved. The simplest way to test that might be to turn on fine logging for the DatasetPage where I've added logging of the time to run the update command. (Note that the overall time seen in the UI includes both the time to save the changes and the time to reload the page. The latter, with 10K files is still many seconds and hasn't been improved in this PR.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?: Probably one for any/all performance updates going into 6.5 along with announcing the feature flag and change to file last modified behavior.

Additional documentation: to be added

coveralls commented 8 hours ago

Coverage Status

coverage: 21.012% (+0.1%) from 20.872% when pulling e0cfcfc30fb4bfde6327e74c8a2ddf9d47baee3e on GlobalDataverseCommunityConsortium:DANS_Performance2 into 068607793b70d6fdd0b0ee1b1a3d2a5bfc2c2574 on IQSS:develop.