Recidiviz / justice-counts

Technical infrastructure for the Justice Counts initiative
GNU General Public License v3.0
2 stars 0 forks source link

[Publisher][Bug] Explore Data: Breakdowns show missing data #1498

Closed nasaownsky closed 2 months ago

nasaownsky commented 2 months ago

Description of the change

Description here

Type of change

All pull requests must have at least one of the following labels applied (otherwise the PR will fail):

Label Description
Type: Bug non-breaking change that fixes an issue
Type: Feature non-breaking change that adds functionality
Type: Breaking Change fix or feature that would cause existing functionality to not work as expected
Type: Non-breaking refactor change addresses some tech debt item or prepares for a later change, but does not change functionality
Type: Configuration Change adjusts configuration to achieve some end related to functionality, development, performance, or security
Type: Dependency Upgrade upgrades a project dependency - these changes are not included in release notes

Related issues

Closes #XXXX

Checklists

Development

This box MUST be checked by the submitter prior to merging:

These boxes should be checked by the submitter prior to merging:

Code review

These boxes should be checked by reviewers prior to merging:

nasaownsky commented 2 months ago

Hello @mxosman! After digging this deeper I found, that the reason of this bug is the order of datapoints. I don't know if this order problem discussed at the time the code was written, but it seems like in datapointsByMetric method of BaseDatapointsStore the resulting datapoint receive frequency of the last datapoint from incoming dataset. In other words for your reference data: if ANNUAL datapoints of given date are positioned after MONTHLY ones of this given date they will be calculated as annual datapoint.

If you look at the debugging log on line 131 of BaseDatapointsStore.ts file (without my order fix), you'll see that for Jan 2021 it returns

BaseDatapointsStore.ts:131 undefined 'Misdemeanor Cases Referred' null 'ANNUAL'
BaseDatapointsStore.ts:131 false 'Other Cases Referred' null 'ANNUAL'
BaseDatapointsStore.ts:131 false 'Unknown Cases Referred' null 'ANNUAL'
BaseDatapointsStore.ts:131 false 'Felony Cases Referred' null 'ANNUAL'
BaseDatapointsStore.ts:131 true 'Felony Cases Referred' 52 'MONTHLY'
BaseDatapointsStore.ts:131 true 'Misdemeanor Cases Referred' 137 'MONTHLY'
BaseDatapointsStore.ts:131 true 'Other Cases Referred' null 'MONTHLY'
BaseDatapointsStore.ts:131 true 'Unknown Cases Referred' null 'MONTHLY'
BaseDatapointsStore.ts:131 true 'Misdemeanor Cases Referred' null 'ANNUAL'
BaseDatapointsStore.ts:131 true 'Other Cases Referred' null 'ANNUAL'
BaseDatapointsStore.ts:131 true 'Unknown Cases Referred' null 'ANNUAL'
BaseDatapointsStore.ts:131 true 'Felony Cases Referred' null 'ANNUAL'
BaseDatapointsStore.ts:131 true 'Felony Cases Referred' 52 'MONTHLY'
BaseDatapointsStore.ts:131 true 'Misdemeanor Cases Referred' 137 'MONTHLY'
BaseDatapointsStore.ts:131 true 'Other Cases Referred' null 'MONTHLY'
BaseDatapointsStore.ts:131 true 'Unknown Cases Referred' null 'MONTHLY'

and for Jan 2020

BaseDatapointsStore.ts:131 undefined 'Unknown Cases Referred' null 'MONTHLY'
BaseDatapointsStore.ts:131 false 'Other Cases Referred' null 'MONTHLY'
BaseDatapointsStore.ts:131 false 'Misdemeanor Cases Referred' 114 'MONTHLY'
BaseDatapointsStore.ts:131 false 'Felony Cases Referred' 50 'MONTHLY'
BaseDatapointsStore.ts:131 true 'Unknown Cases Referred' null 'ANNUAL'
BaseDatapointsStore.ts:131 true 'Other Cases Referred' null 'ANNUAL'
BaseDatapointsStore.ts:131 true 'Misdemeanor Cases Referred' null 'ANNUAL'
BaseDatapointsStore.ts:131 true 'Felony Cases Referred' null 'ANNUAL'
BaseDatapointsStore.ts:131 true 'Unknown Cases Referred' null 'MONTHLY'
BaseDatapointsStore.ts:131 true 'Other Cases Referred' null 'MONTHLY'
BaseDatapointsStore.ts:131 true 'Misdemeanor Cases Referred' 114 'MONTHLY'
BaseDatapointsStore.ts:131 true 'Felony Cases Referred' 50 'MONTHLY'
BaseDatapointsStore.ts:131 true 'Unknown Cases Referred' null 'ANNUAL'
BaseDatapointsStore.ts:131 true 'Other Cases Referred' null 'ANNUAL'
BaseDatapointsStore.ts:131 true 'Misdemeanor Cases Referred' null 'ANNUAL'
BaseDatapointsStore.ts:131 true 'Felony Cases Referred' null 'ANNUAL'

and this way for some reason resulting datapoint in disaggregated object receive ANNUAL frequency. Frankly after many debugging sessions I still don't know why this happening. Maybe you have some ideas?

mxosman commented 2 months ago

Thank you for all the sleuthing @nasaownsky! I have no idea why that is happening. I think ultimately the UI shouldn't care about the ordering of things and display the accurate views. Can you dig into what we need to do in order for the UI to stay agnostic to the ordering?

nasaownsky commented 2 months ago

I'll try! The problem not in the UI itself, but in datapoints preparing method, which gives them wrong frequency depending on the order of the data.

mxosman commented 2 months ago

Sorry by UI - I was meaning the frontend in general including the datapoints prep the stores are doing. If there's a way the FE can handle these cases, that would be preferred b/c the ordering shouldn't matter on the FE - it's so strange that it's creating this kind of issue.

nasaownsky commented 2 months ago

@mxosman please check out my attempt to fix this problem! I think the cause of the problem is that datapoints with different frequencies could overlap in datapointsByMetric method, but by checking if datapoint actually has value we are getting rid of overlapping. Besides, only datapoints with values ​​will reach this part of that method in any case.

mxosman commented 2 months ago

Wait... my mind is absolutely blown! WOW! Taking a closer look shortly.

mxosman commented 2 months ago

Incredible sleuthing, @nasaownsky! It works as expected now. One small thing I noticed is that the dropdown for Calendar Year still shows even though there's no data there.

by checking if datapoint actually has value we are getting rid of overlapping. Besides, only datapoints with values ​​will reach this part of that method in any case.

Can you elaborate on this part? I'm having trouble understanding why checking for the presence of a value and frequency prevents the overwriting.

Oh, I think I'm understanding a little bit. Hmm, I'm wondering if we only need to check for the value and not the frequency as well b/c the value is overwritten based on the datapoint order since the logic is grouping by start_date and this change essentially skips all null datapoints (if I'm understanding right) - so, I'm thinking the frequency check has no effect here?

Do you mind double checking that the charts for other agencies work the same with this change?

nasaownsky commented 2 months ago

@mxosman Yes, you're right! Frequency check is unnecessary there. Just checked it twice, comparing this change's charts with staging charts, and it seems all of them are the same. But you can also check this, so that I don't sound unfounded!

nasaownsky commented 2 months ago

Thank you @mxosman! How should we merge this one?

mxosman commented 2 months ago

Thank you @mxosman! How should we merge this one?

Ah, you can rebase this on top of main and undo the injected datapoints array in the DatapointsStore if the rebase doesn't clear it back to this: https://github.com/Recidiviz/justice-counts/blob/8307a2d20f16a06c81ae20f75cf760cb1da530fe/publisher/src/stores/DatapointsStore.ts#L84