SEED-platform / seed

Standard Energy Efficiency Data (SEED) Platform™ is a web-based application that helps organizations easily manage data on the energy performance of large groups of buildings.
Other
111 stars 54 forks source link

Bug linking properties across cycles when UBID is matching criteria #4774

Open perryr16 opened 2 months ago

perryr16 commented 2 months ago

Issue

Step 6 linking function does not account for the UBID Jaccard Index (as it does in the step 4 matching function).

This can lead to two error scenarios if properties have matching pm_property_ids (or other matching criteria), but different UBIDs. Consider the following setup.

P1: { cycle: 1, pm_property_id: 1, ubid: 1 }
P2: { cycle: 2, pm_property_id: 1, ubid: 2 }
  1. P1 is existing in Cycle 1 and P2 is uploaded to Cycle 2. These properties will be linked across cycles even though the UBIDs do not match.
    P1: { cycle: 1, pm_property_id: 1, ubid: 1 }
    P2: { cycle: 2, pm_property_id: 1, ubid: 2 }
    P3: { cycle: 1, pm_property_id: 1, ubid: 2 }
  2. If P1 and P3 are uploaded to Cycle 1, they will appear as distinct properties as step 4 accounts for UBID. When P2 is uploaded to Cycle 2 we end up with an integrity error and the app hangs
    seed_celery    | django.db.utils.IntegrityError: duplicate key value violates unique constraint "seed_propertyview_property_id_cycle_id_f8bdf6c2_uniq"
    seed_celery    | DETAIL:  Key (property_id, cycle_id)=(31, 4) already exists.

Proposed solution:

Add the jaccard checking logic from step 4 to step 6.