Closed AmandaDoyle closed 4 years ago
Current logic to remove records where the job number is the same should remain unchanged: https://github.com/NYCPlanning/db-developments/blob/master/developments_build/sql/dedupe_job_number.sql
To our knowledge, possible duplicates with different job numbers are not removed - instead they are flagged as inactive through manual research. Please confirm. We would like to use this logic instead for flagging these jobs, rather than what's in x_duplicate.sql.
In the new deduplication logic (#31), we filter to jobs that are not inactive. Do both of the potential duplicate jobs need to have x_inactive IS NULL to be output to the QAQC table, or only one job?
@mgraber Both jobs need to be active. (Slightly different from one is inactive - sorry for lack of clarity, I'll edit above.)
100: Similarly, we remove duplicates programatically. Should we not automatically remove them? If we keep the removal step, are the duplicate QAQC checks to be done before or after removal? #31 #106
@mgraber @AmandaDoyle There are two types of duplicates, which we handle differently.
Translated logic from comments below:
Based on HED criteria, implement logic to identify and remove duplicate job records
Current logic: Remove records where the job number is the same https://github.com/NYCPlanning/db-developments/blob/master/developments_build/sql/dedupe_job_number.sql
Identify potential duplicates https://github.com/NYCPlanning/db-developments/blob/master/developments_build/sql/x_duplicate.sql