cidgoh / DataHarmonizer

A standardized browser-based spreadsheet editor and validator that can be run offline and locally, and which includes templates for SARS-CoV-2 and Monkeypox sampling data. This project, created by the Centre for Infectious Disease Genomics and One Health (CIDGOH), at Simon Fraser University, is now an open-source collaboration with contributions from the National Microbiome Data Collaborative (NMDC), the LinkML development team, and others.
MIT License
91 stars 23 forks source link

Validation(), specifically setDataAtCell(), can be made much more efficient by eliminating re-render on each column. #408

Closed ddooley closed 9 months ago

ddooley commented 10 months ago

DataHarmonizer freezes up / times out on validation of long, wide rows of data. In fact for the GRDI template, validating even 50 rows even if they are almost empty, times out, and from Chrome performance report below, it appears that its because of unnecessary rendering in the setDataAtCell() call.

image

We might be able to dramatically increase performance by switching a cell value update call in the /lib/DataHarmonizer.js script validate() -> getInvalidCells() function which currently is called on each data table column (and GRDI template has 200+ columns). According to HandsOntable docs “Performance issue with instance.setDataAtCell()” , if we can switch from setDataAtCell() to updateData() (rather than loadData which seems to mess with cell/row states) - that looks promising to solve the performance issue. Note that the populateFromArray() method appears to RENDER just like setDataAtCell() so avoid that solution.

Involves replacing "this.hot.setDataAtCell(row, col, update, 'thisChange');" at 2338 and 2346 and the doUniqueValidation() call line 2145 "this.hot.setDataAtCell(provenanceChanges);"

ddooley commented 9 months ago

Related pull request: https://github.com/cidgoh/DataHarmonizer/pull/409

ddooley commented 9 months ago

New pull request solves this. Any testing errors can be reported there