Closed rdmolony closed 3 years ago
Describe the bug ibsg is producing a dataset that is much larger than it should be & so is likely full of duplicate buildings.
ibsg
To Reproduce Steps to reproduce the behavior:
DataFrame
df = pd.read_parquet("PARQUET-FILENAME") len(df) >> 1705109
df["countyname"].unique() >> array(['CO. DUBLIN', 'DUBLIN 1', 'DUBLIN 10', 'DUBLIN 11', 'DUBLIN 12', 'DUBLIN 13', 'DUBLIN 14', 'DUBLIN 15', 'DUBLIN 16', 'DUBLIN 17', 'DUBLIN 18', 'DUBLIN 2', 'DUBLIN 20', 'DUBLIN 22', 'DUBLIN 24', 'DUBLIN 3', 'DUBLIN 4', 'DUBLIN 5', 'DUBLIN 6', 'DUBLIN 6W', 'DUBLIN 7', 'DUBLIN 8', 'DUBLIN 9'], dtype=object)
Expected behavior Dublin should only contain 500k or so buildings
closed by commit 0e1bdc471dc1dc5764b77058d13f3eeaa9bc3987
Describe the bug
ibsg
is producing a dataset that is much larger than it should be & so is likely full of duplicate buildings.To Reproduce Steps to reproduce the behavior:
ibsg
DataFrame
:Expected behavior Dublin should only contain 500k or so buildings