Closed katjam closed 7 months ago
small note, I just pulled down a fresh copy of the DB and it doesn't look like we have duplicate entries for events.
irb(main):005> Event.where(place_id: 247).map{|e| e.dtstart}
Event Load (1.5ms) SELECT "events".* FROM "events" WHERE "events"."place_id" = $1 [["place_id", 247]]
=>
[Fri, 18 Oct 2024 17:00:00.000000000 BST +01:00,
Fri, 19 Jan 2024 17:00:00.000000000 GMT +00:00,
Fri, 16 Feb 2024 17:00:00.000000000 GMT +00:00,
Fri, 15 Mar 2024 17:00:00.000000000 GMT +00:00,
Fri, 19 Apr 2024 17:00:00.000000000 BST +01:00,
Fri, 17 May 2024 17:00:00.000000000 BST +01:00,
Fri, 21 Jun 2024 17:00:00.000000000 BST +01:00,
Fri, 19 Jul 2024 17:00:00.000000000 BST +01:00,
Fri, 16 Aug 2024 17:00:00.000000000 BST +01:00,
Fri, 20 Sep 2024 17:00:00.000000000 BST +01:00,
Fri, 15 Nov 2024 17:00:00.000000000 GMT +00:00,
Fri, 20 Dec 2024 17:00:00.000000000 GMT +00:00,
Sat, 18 Nov 2023 11:00:00.000000000 GMT +00:00]
edit: it's not replicating locally and I SSHed into prod and we do have duplicates in the DB so probably an importer issue. (I'm a bit confused why my DB isn't the same if I just synced though)
irb(main):002> Event.where(place_id: 247).distinct.map{|e| e.dtstart}
D, [2024-03-15T16:33:03.997047 #995] DEBUG -- : Event Load (1.7ms) SELECT DISTINCT "events".* FROM "events" WHERE "events"."place_id" = $1 [["place_id", 247]]
=>
[Fri, 15 Mar 2024 17:00:00.000000000 GMT +00:00,
Fri, 15 Mar 2024 17:00:00.000000000 GMT +00:00,
Fri, 17 May 2024 17:00:00.000000000 BST +01:00,
Fri, 21 Jun 2024 17:00:00.000000000 BST +01:00,
Fri, 19 Jul 2024 17:00:00.000000000 BST +01:00,
Fri, 16 Aug 2024 17:00:00.000000000 BST +01:00,
Fri, 20 Sep 2024 17:00:00.000000000 BST +01:00,
Fri, 15 Mar 2024 17:00:00.000000000 GMT +00:00,
Sat, 18 Nov 2023 11:00:00.000000000 GMT +00:00,
Fri, 15 Nov 2024 17:00:00.000000000 GMT +00:00,
Fri, 20 Dec 2024 17:00:00.000000000 GMT +00:00,
Fri, 17 Jan 2025 17:00:00.000000000 GMT +00:00,
Sat, 03 Feb 2024 11:00:00.000000000 GMT +00:00,
Sat, 10 Feb 2024 13:00:00.000000000 GMT +00:00,
Fri, 18 Oct 2024 17:00:00.000000000 BST +01:00,
Fri, 19 Apr 2024 17:00:00.000000000 BST +01:00,
Fri, 21 Feb 2025 17:00:00.000000000 GMT +00:00]
edit 2: managed to reproduce locally by trashing my DB and re-installing / syncing
[
{
"event_date": "2024-03-15T17:00:00.000+00:00",
"created_at": "2023-10-02T18:07:47.249+01:00"
},
{
"event_date": "2024-03-15T17:00:00.000+00:00",
"created_at": "2023-10-02T18:07:47.276+01:00"
},
{
"event_date": "2024-03-15T17:00:00.000+00:00",
"created_at": "2023-10-18T17:07:18.030+01:00"
}
]
I'm confused why this
irb(main):009> Event.where(place_id: 247, dtstart: "2024-03-15T17:00:00.000+00:00").map{|e| {uid: e.uid,event_date: e.dtstart, id: e.calendar_id,summary: e.sum
mary}}.as_json
Event Load (0.7ms) SELECT "events".* FROM "events" WHERE "events"."place_id" = $1 AND "events"."dtstart" = $2 [["place_id", 247], ["dtstart", "2024-03-15 17:00:00"]]
=>
[{"uid"=>"5nesefucdvre4pq56f4u2d18e4@google.com", "event_date"=>"2024-03-15T17:00:00.000+00:00", "id"=>143, "summary"=>"Queer Norfolk: Archiving Late"},
{"uid"=>"5nesefucdvre4pq56f4u2d18e4@google.com", "event_date"=>"2024-03-15T17:00:00.000+00:00", "id"=>143, "summary"=>"Queer Norfolk: Archiving Late"},
{"uid"=>"5nesefucdvre4pq56f4u2d18e4@google.com", "event_date"=>"2024-03-15T17:00:00.000+00:00", "id"=>143, "summary"=>"Queer Norfolk: Archiving Late"}]
irb(main):010>
isn't being caught by this
# Ensures that the event added is unique
# Checks for a duplicate event with the properties dtstart, summary, and calendar_id
def unique_event
return unless Event.where(uid: uid,
dtstart: dtstart,
summary: summary,
calendar_id: calendar_id)
.count
.positive?
errors.add(:base, 'Unfortunately this event is a duplicate of an ' \
"existing event for calendar: #{calendar_id} " \
"('#{calendar.name}')")
end
are we perhaps updating events and creating duplicate?
validate :unique_event, on: :create # If we are updating the event we don't want it to trigger!
Feels like there's maybe something going on here. We get multiple events with the same uid
and pick the first
off the list. There's an assumption that there is only one and we aren't handling the case of there being multiple like above.
If you jump into the db console with rails db
and run
select *
from (
select uid, count(*)
from events
group by uid)
grouped_events
where count > 1
order by count desc
It gives you the full run down of all the UIDs and their counts, FWIW.
E.g.
uid | count
-----------------------------------------------------------------------+-------
020on4hgov31qn4caq2ogaf4r9@google.com | 262
5ekeb2fbbne7lemsenpcn4uqor@google.com | 262
7he4r679it9phen56agp719qfe@google.com | 262
6n7ssbf039v618u8mffk58elu7@google.com | 158
7mml4maqi05eblkbvs2ohmc1c4@google.com | 157
isbg5oeqfhrrq22k7nn48363ak@google.com | 157
15ol4g4fqeld3nle3i52l96iuk@google.com | 156
6dmafin6s82grphba0vnokvfcp@google.com | 105
45sjro1d72a4ufl36l97fe8bnf@google.com | 105
619cvqu6olfiqdjdbm8sgkvdvb@google.com | 105
15crqk4oaka91i64518cfedocb@google.com | 105
4cnib8lm8rgba6d9e3hu6dfavt_R20181001T173000@google.com | 105
44f1kmvq4c0uk49ommteuchukg@google.com | 105
5f0fnoo8kbubpkbt4hvsh20peb@google.com | 105
0l8k4q6o7q0p092cgf8kpiki27@google.com | 105
6f8rjj03cunph7ceilt5ptjn0j@google.com | 104
3nmemfvb3mnclgsa8cqt2cdauo@google.com | 54
2kf9rvio8pvrpm73trkha7qjej@google.com | 53
7rkhu05cbhgvlld9o2f97nnk9q@google.com | 53
32rf1vklvscmeiptf0nkc89jlb@google.com | 53
4rilq11omb80hjrhhbuik1779s@google.com | 53
2mv5bggirsmt79sse00rpnfuio@google.com | 53
0h7f1632tlopargi2dobpk5apt@google.com | 53
5qvs4vt92ug75vctt1f5du74kh@google.com | 53
7aidj9g0c34vq8uases2vmn6pe@google.com | 53
5lps5mao3pbaa3vk16n9qr64lj@google.com | 53
0k88cn3f922hnuf93vfsapcrg2@google.com | 53
7ftd73uoum5rqle4sp7cbu7kou@google.com | 53
1a44q9j7rl9rc0rla1sltqgf55@google.com | 53
1irk90ahed7h6jb7nv6h98ppsj@google.com | 53
15rtej0h7p3mepr8lfs4lh1glf@google.com | 53
o94e1gecfmdue9kndhmmi95kgg@google.com | 53
0k6p7onk9m1mov4g47qgks0nck@google.com | 53
17da6prmpd01ahhu1650f3dkt6@google.com | 53
5m5pjjgs6chin0uhpstjr83g5l@google.com | 53
1vr6p8e03cqmnrgnvgo8n13696@google.com | 53
251jkpld6jks03msrg8mh6g7mt@google.com | 53
1ie69kf7ap6s5agbftbnp3m3sb@google.com | 53
7bs80cbejip9qffdlfho6eqreh@google.com | 53
1c50tujtqln275ga63onenc5ks@google.com | 53
35ba4s98j7rsvelulb66aurp2h@google.com | 53
7s8chc809igm7onno8c4guelu6@google.com | 53
76mjaqn7hcadqu7n59oggdmp90@google.com | 53
5j7hf5ceglajbuvgsj97tej94l@google.com | 53
6cshoim4uuvs4gjdkshbtm0daa@google.com | 53
1ufq741k9vn3cpvi0j4u7ms3bg@google.com | 53
1ultsq0lkb68c33th6mtuju124@google.com | 53
01s0ovglekm69ibvhd20sfdh61@google.com | 53
cgsj6d9mc5h36bb46dj68b9k71j3cbb1c5ij6b9kcdhmcc9nccq3idr56c@google.com | 53
...
Digging in a bit more i wrote this script
found = []
Calendar.all.each do |calendar|
duplicate_events = calendar.events.group(:uid).count
next if duplicate_events.count < 2
found << { calendar_id: calendar.id, source: calendar.source, count: duplicate_events.count }
end
found.sort! { |a, b| b[:count] <=> a[:count] }
found.each do |f|
puts "calendar: #{f[:calendar_id]}"
puts " source: #{f[:source]}"
puts " count: #{f[:count]}"
end
which prints the calendars that have duplicate events
See here for output
From my local dev snapshot of production I have written a script that lists all the duplicate events
The first line is the event UID, then each row is event created_at
, dtstart
, dtend
, summary
and calendar_id
5nesefucdvre4pq56f4u2d18e4@google.com
2024-03-11 09:18:10 +0000 2024-03-15 17:00:00 +0000 2024-03-15 19:00:00 +0000 Queer Norfolk: Archiving Late 143
2024-02-20 20:07:54 +0000 2024-03-15 17:00:00 +0000 2024-03-15 19:00:00 +0000 Queer Norfolk: Archiving Late 143
2024-02-20 20:07:56 +0000 2024-03-15 17:00:00 +0000 2024-03-15 19:00:00 +0000 Queer Norfolk: Archiving Late 143
37r53ov39h6m1n948thtpp61ro@google.com
2024-03-08 23:00:56 +0000 2024-03-27 11:00:00 +0000 2024-03-27 13:30:00 +0000 S2S - ⚒ Green wood work 39
2024-02-23 13:03:51 +0000 2024-03-27 11:00:00 +0000 2024-03-27 13:30:00 +0000 S2S - ⚒ Green wood work 39
76ma3pl0c9mqgc6rcs5nmea55o@google.com
2022-10-18 23:52:21 +0100 2022-10-18 00:00:00 +0100 2022-10-19 00:00:00 +0100 Workbench hire 39
2022-10-12 15:53:25 +0100 2022-10-18 00:00:00 +0100 2022-10-19 00:00:00 +0100 Workbench hire 39
2022-10-13 10:03:28 +0100 2022-10-18 00:00:00 +0100 2022-10-19 00:00:00 +0100 Workbench hire 39
3nmemfvb3mnclgsa8cqt2cdauo@google.com
2024-03-09 23:00:53 +0000 2024-07-20 10:00:00 +0100 2024-07-20 13:00:00 +0100 Powerchair Football 30
2024-03-06 13:01:36 +0000 2024-07-20 10:00:00 +0100 2024-07-20 13:00:00 +0100 Powerchair Football 30
4ao2d36r8ngo5s04i4v7dhf7fv_R20210809T180000@google.com
2022-05-04 13:39:54 +0100 2021-12-13 19:00:00 +0000 2021-12-13 21:00:00 +0000 Changes (online) 100
2022-05-04 13:39:54 +0100 2021-12-13 19:00:00 +0000 2021-12-13 21:00:00 +0000 Changes (in-person) 100
If you delete all the duplicate events (another script I have written) and run all the imports again the duplicate events don't come back.
I included the updated_at
to see if these events are "old" and possibly from older code, but some of them look like they where made in the last month or so.
Is this the same bug as #1622?
Ivan successfully ran the rake task on staging and I couldn't find any other duplicate events.
Description
Import a calendar (Google) - some events are showing up more than once in PlaceCal listing.
Maybe related to: https://github.com/geeksforsocialchange/PlaceCal/issues/1802 https://github.com/geeksforsocialchange/PlaceCal/pull/1827#discussion_r1184733827 https://github.com/geeksforsocialchange/PlaceCal/issues/1596
https://norwich.placecal.org/events/2024/3/15
Steps to reproduce
What you expected to happen
The events are only showing once in the source calendar, but displaying in duplicate in PlaceCal