freezingsaddles / freezing-sync

Freezing Saddles activity retrieval and processing
Apache License 2.0
1 stars 1 forks source link

Eliminate duplicate rides #34

Open merlinorg opened 2 years ago

merlinorg commented 2 years ago

Some athletes routinely ride "with" themself. I suspect using both computer and phone to track ride. These get recorded as separate rides in FS.

Crudely bucketing by 5 minute start time we see...

mysql> select count(*), start_date, avg(distance), sum(distance) from rides group by athlete_id, floor(unix_timestamp(start_date)/300) having count(*) > 1;
+----------+---------------------+--------------------+--------------------+
| count(*) | start_date          | avg(distance)      | sum(distance)      |
+----------+---------------------+--------------------+--------------------+
|        2 | 2022-01-02 09:19:21 | 1.5630000233650208 | 3.1260000467300415 |
|        2 | 2022-01-01 00:02:22 | 6.6980001870542765 | 13.396000374108553 |
|        2 | 2022-01-10 18:33:49 |  4.610000133514404 |  9.220000267028809 |
|        2 | 2022-01-12 11:01:10 | 12.449999809265137 | 24.899999618530273 |
|        2 | 2022-01-05 07:53:07 |  10.02299976348877 |  20.04599952697754 |
|        2 | 2022-01-12 07:18:55 |  15.75950002670288 |  31.51900005340576 |
|        2 | 2022-01-03 00:09:31 |  5.579000115394592 | 11.158000230789185 |
|        2 | 2022-01-11 21:36:31 |  6.117000102996826 | 12.234000205993652 |
|        2 | 2022-01-12 19:42:54 |  2.253499984741211 |  4.506999969482422 |
|        2 | 2022-01-13 18:00:53 | 17.553000450134277 | 35.106000900268555 |
+----------+---------------------+--------------------+--------------------+
10 rows in set (0.00 sec)
merlinorg commented 7 months ago
drop table dups;
create table dups(id bigint) as (select max(id) as id from rides group by athlete_id, floor(unix_timestamp(start_date)/300) having count(*) > 1);
delete from ride_weather using ride_weather left join dups on dups.id = ride_weather.ride_id where dups.id is not null;
delete from ride_efforts using ride_efforts left join dups on dups.id = ride_efforts.ride_id where dups.id is not null;
delete from rides using rides left join dups on dups.id = rides.id where dups.id is not null;
mysql> create table dups(id bigint) as (select max(id) as id from rides group by athlete_id, floor(unix_timestamp(start_date)/300) having count(*) > 1);
Query OK, 45 rows affected (0.07 sec)
Records: 45  Duplicates: 0  Warnings: 0

mysql> delete from ride_weather using ride_weather left join dups on dups.id = ride_weather.ride_id where dups.id is not null;
Query OK, 43 rows affected (0.40 sec)

mysql> delete from ride_efforts using ride_efforts left join dups on dups.id = ride_efforts.ride_id where dups.id is not null;
Query OK, 1347 rows affected (0.07 sec)

mysql> delete from rides using rides left join dups on dups.id = rides.id where dups.id is not null;
Query OK, 45 rows affected (0.02 sec)