Open RDmitchell opened 1 year ago
I will do some testing, but if the labels are not being applied from the Data Quality checks on import, we need to fix that.
It is true that labels are not being added from the Data Quality review on import.
See this doc for details. https://docs.google.com/document/d/1i2qE9bYfb_VDUS3Ul6AyAt9CA6ur0e2DQoVTfysfOGY/edit?usp=sharing
@axelstudios -- this seems fairly important -- can we add it as a relatively high priority to the Q3 list?
@RDmitchell just wanted to check where this is at!
Currently we have to run data quality checks manually for all of our clients and all of their cycles since we don't know when they are adding data and to which cycles. We have to do this at regular intervals to ensure that labels are being updated. It isn't really sustainable for us to do that.
@nllong / @isalanglois -- can we add this to a release patch, maybe for 2.18.0 (?), in the relatively near future.
See @dreneden1 comment above that they have to do the data quality checks by hand because this isn't working.
@dreneden1 -- reply from @axelstudios
I spent some time looking into that issue, and even though it seems concerning I think it's working as expected
We have two main endpoints for data quality checks - against the imported records (properties/taxlots), and against raw data in import files before they're loaded into SEED
I think the issue with applying labels during the import is that it's before matching/merging, and the newly-imported records could be merged into existing records that have already been fixed
For instance, you have a rule for Year Built missing, and it applies a label to flag it. If you've imported data and fixed the missing fields, and then import a new file without Year Built that completely merges into existing records then data quality will flag the import issues, but there would be no actual issues after merging. I think that's why you have to manually run the rules after import
@dreneden1 -- so I think that we probably want to leave the current functionality, which does mean you need to run the DQ checks manually in the Inventory screen after importing the data.
Hi @RDmitchell I'm not sure I followed all of @axelstudios's comments:
We have two main endpoints for data quality checks - against the imported records (properties/taxlots), and against raw data in import files before they're loaded into SEED
What's the difference?
I think the issue with applying labels during the import is that it's before matching/merging, and the newly-imported records could be merged into existing records that have already been fixed
For instance, you have a rule for Year Built missing, and it applies a label to flag it. If you've imported data and fixed the missing fields, and then import a new file without Year Built that completely merges into existing records then data quality will flag the import issues, but there would be no actual issues after merging. I think that's why you have to manually run the rules after import
I agree with this: a user could be uploaded a file with just a couple of columns (e.g., Property ID and GFA), so you could get a bunch of false alarms of missing fields that were not intended to be imported in the first place. That being said, it would be great if there was an automatic data quality check run that happened once the data was merge in - in other words, an automated data quality check that was happening on the updated inventory, including all columns. That's what I thought SEED was doing, but it doesn't seem so.
@axelstudios -- can you review @dreneden1 suggestions and see if that can be implemented in SEED? Thx
This issue has been automatically marked as stale because it has not had recent activity within 60 days. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity within 60 days. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity within 60 days. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity within 60 days. It will be closed if no further activity occurs. Thank you for your contributions.
Hi @RDmitchell - just wanted to circle back on this one. Is there a pathway to automatically run the data quality checker after an upload?
@dreneden1 -- I don't believe there is a way to do it automatically, but you can run the DQ rules anytime via the Actions menu in the Inventory List.
This is on our list to address, probably next quarter.
Got it - as long as it is on your list!
From Dan Eden, SEED user
We are using the data quality checks, combined with labels, to help us with QA/QC.
Do the labels get applied automatically as part of each data upload? I know that the data quality checker runs during the data pairing process, but I don't think that labels get applied to the properties... I feel like they used to, but maybe not (since sometimes only a few columns are uploaded and this would trigger many labels for missing fields...).
Either way, it seems like I now have to manually select all properties and run the data quality checker for the labels to be applied. Ideally, the data quality checker automatically and update the labels on the properties. Is that something that we can discuss?