Data Quality checking should be adding labels on import

RDmitchell commented 1 year ago

From Dan Eden, SEED user

We are using the data quality checks, combined with labels, to help us with QA/QC.

Do the labels get applied automatically as part of each data upload? I know that the data quality checker runs during the data pairing process, but I don't think that labels get applied to the properties... I feel like they used to, but maybe not (since sometimes only a few columns are uploaded and this would trigger many labels for missing fields...).

Either way, it seems like I now have to manually select all properties and run the data quality checker for the labels to be applied. Ideally, the data quality checker automatically and update the labels on the properties. Is that something that we can discuss?

RDmitchell commented 1 year ago

I will do some testing, but if the labels are not being applied from the Data Quality checks on import, we need to fix that.

RDmitchell commented 1 year ago

It is true that labels are not being added from the Data Quality review on import.

See this doc for details. https://docs.google.com/document/d/1i2qE9bYfb_VDUS3Ul6AyAt9CA6ur0e2DQoVTfysfOGY/edit?usp=sharing

RDmitchell commented 1 year ago

@axelstudios -- this seems fairly important -- can we add it as a relatively high priority to the Q3 list?

dreneden1 commented 1 year ago

@RDmitchell just wanted to check where this is at!

Currently we have to run data quality checks manually for all of our clients and all of their cycles since we don't know when they are adding data and to which cycles. We have to do this at regular intervals to ensure that labels are being updated. It isn't really sustainable for us to do that.

RDmitchell commented 1 year ago

@nllong / @isalanglois -- can we add this to a release patch, maybe for 2.18.0 (?), in the relatively near future.

See @dreneden1 comment above that they have to do the data quality checks by hand because this isn't working.

RDmitchell commented 1 year ago

@dreneden1 -- reply from @axelstudios

I spent some time looking into that issue, and even though it seems concerning I think it's working as expected

We have two main endpoints for data quality checks - against the imported records (properties/taxlots), and against raw data in import files before they're loaded into SEED

I think the issue with applying labels during the import is that it's before matching/merging, and the newly-imported records could be merged into existing records that have already been fixed

For instance, you have a rule for Year Built missing, and it applies a label to flag it. If you've imported data and fixed the missing fields, and then import a new file without Year Built that completely merges into existing records then data quality will flag the import issues, but there would be no actual issues after merging. I think that's why you have to manually run the rules after import

RDmitchell commented 1 year ago

@dreneden1 -- so I think that we probably want to leave the current functionality, which does mean you need to run the DQ checks manually in the Inventory screen after importing the data.

dreneden1 commented 1 year ago

Hi @RDmitchell I'm not sure I followed all of @axelstudios's comments:

We have two main endpoints for data quality checks - against the imported records (properties/taxlots), and against raw data in import files before they're loaded into SEED

What's the difference?

I think the issue with applying labels during the import is that it's before matching/merging, and the newly-imported records could be merged into existing records that have already been fixed

For instance, you have a rule for Year Built missing, and it applies a label to flag it. If you've imported data and fixed the missing fields, and then import a new file without Year Built that completely merges into existing records then data quality will flag the import issues, but there would be no actual issues after merging. I think that's why you have to manually run the rules after import

I agree with this: a user could be uploaded a file with just a couple of columns (e.g., Property ID and GFA), so you could get a bunch of false alarms of missing fields that were not intended to be imported in the first place. That being said, it would be great if there was an automatic data quality check run that happened once the data was merge in - in other words, an automated data quality check that was happening on the updated inventory, including all columns. That's what I thought SEED was doing, but it doesn't seem so.

RDmitchell commented 1 year ago

@axelstudios -- can you review @dreneden1 suggestions and see if that can be implemented in SEED? Thx

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity within 60 days. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity within 60 days. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity within 60 days. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity within 60 days. It will be closed if no further activity occurs. Thank you for your contributions.

dreneden1 commented 8 months ago

Hi @RDmitchell - just wanted to circle back on this one. Is there a pathway to automatically run the data quality checker after an upload?

RDmitchell commented 8 months ago

@dreneden1 -- I don't believe there is a way to do it automatically, but you can run the DQ rules anytime via the Actions menu in the Inventory List.

This is on our list to address, probably next quarter.

dreneden1 commented 8 months ago

Got it - as long as it is on your list!

SEED-platform / seed

Data Quality checking should be adding labels on import #3896