Open Hagellach37 opened 11 months ago
SELECT
count(distinct changeset_id) as changesets,
count(distinct user_id) as users,
ifNull(sum(road_length_delta)/1000, 0) as roads,
ifNull(sum(building_edit), 0) as buildings,
count(map_feature_edit) as edits,
max(changeset_timestamp) as latest
FROM (
SELECT
min(changeset_id) as changeset_id,
user_id,
min(road_length_delta) as road_length_delta,
min(building_edit) as building_edit,
min(map_feature_edit) as map_feature_edit,
changeset_timestamp,
groupArray(hashtag) as hashtags,
min(country_iso_a3)
FROM "stats"
WHERE
startsWith(hashtag, 'm')
AND changeset_timestamp > parseDateTimeBestEffort('2020-01-01T00:00:00Z')
AND changeset_timestamp < parseDateTimeBestEffort('2022-01-01T00:00:00Z')
GROUP BY
changeset_timestamp,
user_id,
osm_id
) as grouped_stats
WHERE
arrayExists(hashtag -> startsWith(hashtag, 'm'), hashtags)
-> gives ~2/3rd the buildings of our current approach -> Issue: takes 6 seconds instead of 0.4s
I investigated further and tried to create a aggregated projection like so:
ALTER TABLE stats ADD PROJECTION combined_hashtag (
SELECT
min(changeset_id) as changeset_id,
user_id,
min(road_length_delta) as road_length_delta,
min(building_edit) as building_edit,
min(map_feature_edit) as map_feature_edit,
changeset_timestamp,
groupArray(hashtag) as hashtags,
min(country_iso_a3) as country_iso_a3
GROUP BY
changeset_timestamp,
user_id,
osm_id
)
;
Sadly there seems to be no way to access the newly defined column "hashtags" so I cannot query for arrayExists(hashtag -> startsWith(hashtag, 'm'), hashtags)
, rendering it useless.
contributions could be counted more than once with the current implementation of the
/stats
endpoint when the wildcard*
filter is used.When using the wildcard filter we could first group values by contribution_id before deriving the sum(buildings) or sum(road_length_delta).
https://github.com/GIScience/ohsome-now-stats-service/blob/ba1bedb773e672437426c2a8bca001e14f54329b/src/main/kotlin/org/heigit/ohsome/stats/StatsRepo.kt#L27-L40