MauroDataMapper / mdm-ui

Web front-end for the Mauro Data Mapper
Apache License 2.0
7 stars 5 forks source link

UI cannot handle Summary Metadata at the DataClass level #477

Closed olliefreeman closed 2 years ago

olliefreeman commented 2 years ago

When trying to look at the "data" section of a DataClass the UI is just stalling and the "working" icon is spinning but skipping. Reloading the page as no effect and the entire page is unresponsive, the only solution is to close the tab/window.

To reproduce:

The issue may well be the size of the return from the API which is 6MB for the http://localhost:8080/api/dataClasses/581f80af-9282-4b50-9e94-8f7113d2a148/summaryMetadata/335e57d6-ad27-4527-b949-a3f7b9d2292c/summaryMetadataReports

modules.json.zip

olliefreeman commented 2 years ago

@jamesrwelch @pjmonks the attached json file is the SM and ET detection on 1 schema from the modules db, one of the small ones, you can view the SM data at the DE level but not at the DC level.

abwilson23 commented 2 years ago

Summary of investigation so far:

  1. When attempting to import the attached file, I received several errors related to the reportDates in the json file being null, even though they were not null. The fix was to use search and replace to make every reportDate the same value. Import worked fine after.
  2. When viewing the summary metadata elsewhere, the charts load just fine.
  3. Limiting the number of api requests fired leads to a successful partial load of the summary metadata for ethnicity_orbit. Additionally, each of the individual requests load successfully when fired one after another in postman.
  4. This means it is likely due to the volume of data coming back when letting them all fire together, as @olliefreeman noted above.

Next steps: Refactor the api call in the summary-metadata-table component to use RxJs + some form of request throttling.

jon-bowen commented 2 years ago

Before I went down with Covid I did some investigation on this one. By putting in some breakpoints I was able to establish that the charting package was attempting to render several (5?, 6?) charts with upwards of 150,000 points in each. The page wasn't broken, but when I put breakpoints in judicious places and turned them on and off I found that 99% of the time when I turned the breakpoints on the code would break in the same place, where the "scale" of the chart was being calculated and applied to the data points. When left for long enough the page would eventually render, after a couple of hours. Unfortunately it's been 2 weeks since I was looking at this, but I'm pretty sure the time was being spent in "computeMinSampleSize" in node_modules\chart.js\dist\Chart.js

jon-bowen commented 2 years ago

According to the label on the chart, the data represents a sample of 1% of the original data, but this is clearly still an overwhelming amount of data in this instance. I would have thought a maximum of 1,500 data points would be more realistic, but would such a tiny sample (0.01%) of the original data actually be meaningful in any way?

pjmonks commented 2 years ago

I've tried this myself and @jon-bowen is correct. I imported the data model attached to this issue and viewed "demographics > ethnicity_orbit", then there are summary metadata reports like:

  1. cdw_patient_id
  2. source_episode_id
  3. source_spell_id

When I request the map data for these they return something like 150,000+ data points. The map data looks like every single unique identifier is counted once (!) This is simply too much for the chart to render, which blocks all other rendering/processing.

If I put an artificial limit check on the chart component to not go above a reasonable number (I just tested with 30 as an experiment) then the chart doesn't render for that large sample and can continue on without issue. But then there needs to be a decision what to do when one of these large charts is encountered - do we just say we cannot display it because of its size?

I suppose the original question would be why is there so much data present in the first place? In this case what does sampling the unique identifiers of records achieve?

cc @olliefreeman @jamesrwelch

pjmonks commented 2 years ago

After discussion with everyone, we have agreed that the backend should not have been returning such a large amount of report data anyway, which makes this not a UI issue. When reasonable report points are returned there is no issue with rendering the charts. Closing this issue.