geeksforsocialchange / PlaceCal

Bring your community together
https://placecal.org
GNU Affero General Public License v3.0
17 stars 8 forks source link

[Bug]: Events displaying multiple times #2287

Closed katjam closed 7 months ago

katjam commented 8 months ago

Description

Import a calendar (Google) - some events are showing up more than once in PlaceCal listing.

Maybe related to: https://github.com/geeksforsocialchange/PlaceCal/issues/1802 https://github.com/geeksforsocialchange/PlaceCal/pull/1827#discussion_r1184733827 https://github.com/geeksforsocialchange/PlaceCal/issues/1596

https://norwich.placecal.org/events/2024/3/15 image

Steps to reproduce

  1. View events for Queer Norfolk

What you expected to happen

The events are only showing once in the source calendar, but displaying in duplicate in PlaceCal

aaaaargZombies commented 8 months ago

small note, I just pulled down a fresh copy of the DB and it doesn't look like we have duplicate entries for events.

irb(main):005> Event.where(place_id: 247).map{|e| e.dtstart}
  Event Load (1.5ms)  SELECT "events".* FROM "events" WHERE "events"."place_id" = $1  [["place_id", 247]]
=> 
[Fri, 18 Oct 2024 17:00:00.000000000 BST +01:00,
 Fri, 19 Jan 2024 17:00:00.000000000 GMT +00:00,
 Fri, 16 Feb 2024 17:00:00.000000000 GMT +00:00,
 Fri, 15 Mar 2024 17:00:00.000000000 GMT +00:00,
 Fri, 19 Apr 2024 17:00:00.000000000 BST +01:00,
 Fri, 17 May 2024 17:00:00.000000000 BST +01:00,
 Fri, 21 Jun 2024 17:00:00.000000000 BST +01:00,
 Fri, 19 Jul 2024 17:00:00.000000000 BST +01:00,
 Fri, 16 Aug 2024 17:00:00.000000000 BST +01:00,
 Fri, 20 Sep 2024 17:00:00.000000000 BST +01:00,
 Fri, 15 Nov 2024 17:00:00.000000000 GMT +00:00,
 Fri, 20 Dec 2024 17:00:00.000000000 GMT +00:00,
 Sat, 18 Nov 2023 11:00:00.000000000 GMT +00:00]

edit: it's not replicating locally and I SSHed into prod and we do have duplicates in the DB so probably an importer issue. (I'm a bit confused why my DB isn't the same if I just synced though)

irb(main):002> Event.where(place_id: 247).distinct.map{|e| e.dtstart}
D, [2024-03-15T16:33:03.997047 #995] DEBUG -- :   Event Load (1.7ms)  SELECT DISTINCT "events".* FROM "events" WHERE "events"."place_id" = $1  [["place_id", 247]]
=> 
[Fri, 15 Mar 2024 17:00:00.000000000 GMT +00:00,
 Fri, 15 Mar 2024 17:00:00.000000000 GMT +00:00,
 Fri, 17 May 2024 17:00:00.000000000 BST +01:00,
 Fri, 21 Jun 2024 17:00:00.000000000 BST +01:00,
 Fri, 19 Jul 2024 17:00:00.000000000 BST +01:00,
 Fri, 16 Aug 2024 17:00:00.000000000 BST +01:00,
 Fri, 20 Sep 2024 17:00:00.000000000 BST +01:00,
 Fri, 15 Mar 2024 17:00:00.000000000 GMT +00:00,
 Sat, 18 Nov 2023 11:00:00.000000000 GMT +00:00,
 Fri, 15 Nov 2024 17:00:00.000000000 GMT +00:00,
 Fri, 20 Dec 2024 17:00:00.000000000 GMT +00:00,
 Fri, 17 Jan 2025 17:00:00.000000000 GMT +00:00,
 Sat, 03 Feb 2024 11:00:00.000000000 GMT +00:00,
 Sat, 10 Feb 2024 13:00:00.000000000 GMT +00:00,
 Fri, 18 Oct 2024 17:00:00.000000000 BST +01:00,
 Fri, 19 Apr 2024 17:00:00.000000000 BST +01:00,
 Fri, 21 Feb 2025 17:00:00.000000000 GMT +00:00]

edit 2: managed to reproduce locally by trashing my DB and re-installing / syncing

 [
  {
    "event_date": "2024-03-15T17:00:00.000+00:00",
    "created_at": "2023-10-02T18:07:47.249+01:00"
  },
  {
    "event_date": "2024-03-15T17:00:00.000+00:00",
    "created_at": "2023-10-02T18:07:47.276+01:00"
  },
  {
    "event_date": "2024-03-15T17:00:00.000+00:00",
    "created_at": "2023-10-18T17:07:18.030+01:00"
  }
]
aaaaargZombies commented 8 months ago

I'm confused why this

irb(main):009> Event.where(place_id: 247, dtstart: "2024-03-15T17:00:00.000+00:00").map{|e| {uid: e.uid,event_date: e.dtstart, id: e.calendar_id,summary: e.sum
mary}}.as_json
  Event Load (0.7ms)  SELECT "events".* FROM "events" WHERE "events"."place_id" = $1 AND "events"."dtstart" = $2  [["place_id", 247], ["dtstart", "2024-03-15 17:00:00"]]
=> 
[{"uid"=>"5nesefucdvre4pq56f4u2d18e4@google.com", "event_date"=>"2024-03-15T17:00:00.000+00:00", "id"=>143, "summary"=>"Queer Norfolk: Archiving Late"},
 {"uid"=>"5nesefucdvre4pq56f4u2d18e4@google.com", "event_date"=>"2024-03-15T17:00:00.000+00:00", "id"=>143, "summary"=>"Queer Norfolk: Archiving Late"},
 {"uid"=>"5nesefucdvre4pq56f4u2d18e4@google.com", "event_date"=>"2024-03-15T17:00:00.000+00:00", "id"=>143, "summary"=>"Queer Norfolk: Archiving Late"}]
irb(main):010>

isn't being caught by this

  # Ensures that the event added is unique
  # Checks for a duplicate event with the properties dtstart, summary, and calendar_id
  def unique_event
    return unless Event.where(uid: uid,
                              dtstart: dtstart,
                              summary: summary,
                              calendar_id: calendar_id)
                       .count
                       .positive?

    errors.add(:base, 'Unfortunately this event is a duplicate of an ' \
                      "existing event for calendar: #{calendar_id} " \
                      "('#{calendar.name}')")
  end

view code in context

are we perhaps updating events and creating duplicate?

  validate :unique_event, on: :create # If we are updating the event we don't want it to trigger!

view code in context

aaaaargZombies commented 8 months ago

Feels like there's maybe something going on here. We get multiple events with the same uid and pick the first off the list. There's an assumption that there is only one and we aren't handling the case of there being multiple like above.

ivan-kocienski-gfsc commented 8 months ago

If you jump into the db console with rails db and run

select * 
from (
  select uid, count(*) 
  from events 
  group by uid) 
grouped_events 
where count > 1 
order by count desc

It gives you the full run down of all the UIDs and their counts, FWIW.

E.g.

                                  uid                                  | count 
-----------------------------------------------------------------------+-------
 020on4hgov31qn4caq2ogaf4r9@google.com                                 |   262
 5ekeb2fbbne7lemsenpcn4uqor@google.com                                 |   262
 7he4r679it9phen56agp719qfe@google.com                                 |   262
 6n7ssbf039v618u8mffk58elu7@google.com                                 |   158
 7mml4maqi05eblkbvs2ohmc1c4@google.com                                 |   157
 isbg5oeqfhrrq22k7nn48363ak@google.com                                 |   157
 15ol4g4fqeld3nle3i52l96iuk@google.com                                 |   156
 6dmafin6s82grphba0vnokvfcp@google.com                                 |   105
 45sjro1d72a4ufl36l97fe8bnf@google.com                                 |   105
 619cvqu6olfiqdjdbm8sgkvdvb@google.com                                 |   105
 15crqk4oaka91i64518cfedocb@google.com                                 |   105
 4cnib8lm8rgba6d9e3hu6dfavt_R20181001T173000@google.com                |   105
 44f1kmvq4c0uk49ommteuchukg@google.com                                 |   105
 5f0fnoo8kbubpkbt4hvsh20peb@google.com                                 |   105
 0l8k4q6o7q0p092cgf8kpiki27@google.com                                 |   105
 6f8rjj03cunph7ceilt5ptjn0j@google.com                                 |   104
 3nmemfvb3mnclgsa8cqt2cdauo@google.com                                 |    54
 2kf9rvio8pvrpm73trkha7qjej@google.com                                 |    53
 7rkhu05cbhgvlld9o2f97nnk9q@google.com                                 |    53
 32rf1vklvscmeiptf0nkc89jlb@google.com                                 |    53
 4rilq11omb80hjrhhbuik1779s@google.com                                 |    53
 2mv5bggirsmt79sse00rpnfuio@google.com                                 |    53
 0h7f1632tlopargi2dobpk5apt@google.com                                 |    53
 5qvs4vt92ug75vctt1f5du74kh@google.com                                 |    53
 7aidj9g0c34vq8uases2vmn6pe@google.com                                 |    53
 5lps5mao3pbaa3vk16n9qr64lj@google.com                                 |    53
 0k88cn3f922hnuf93vfsapcrg2@google.com                                 |    53
 7ftd73uoum5rqle4sp7cbu7kou@google.com                                 |    53
 1a44q9j7rl9rc0rla1sltqgf55@google.com                                 |    53
 1irk90ahed7h6jb7nv6h98ppsj@google.com                                 |    53
 15rtej0h7p3mepr8lfs4lh1glf@google.com                                 |    53
 o94e1gecfmdue9kndhmmi95kgg@google.com                                 |    53
 0k6p7onk9m1mov4g47qgks0nck@google.com                                 |    53
 17da6prmpd01ahhu1650f3dkt6@google.com                                 |    53
 5m5pjjgs6chin0uhpstjr83g5l@google.com                                 |    53
 1vr6p8e03cqmnrgnvgo8n13696@google.com                                 |    53
 251jkpld6jks03msrg8mh6g7mt@google.com                                 |    53
 1ie69kf7ap6s5agbftbnp3m3sb@google.com                                 |    53
 7bs80cbejip9qffdlfho6eqreh@google.com                                 |    53
 1c50tujtqln275ga63onenc5ks@google.com                                 |    53
 35ba4s98j7rsvelulb66aurp2h@google.com                                 |    53
 7s8chc809igm7onno8c4guelu6@google.com                                 |    53
 76mjaqn7hcadqu7n59oggdmp90@google.com                                 |    53
 5j7hf5ceglajbuvgsj97tej94l@google.com                                 |    53
 6cshoim4uuvs4gjdkshbtm0daa@google.com                                 |    53
 1ufq741k9vn3cpvi0j4u7ms3bg@google.com                                 |    53
 1ultsq0lkb68c33th6mtuju124@google.com                                 |    53
 01s0ovglekm69ibvhd20sfdh61@google.com                                 |    53
 cgsj6d9mc5h36bb46dj68b9k71j3cbb1c5ij6b9kcdhmcc9nccq3idr56c@google.com |    53
...
ivan-kocienski-gfsc commented 8 months ago

Digging in a bit more i wrote this script

    found = []

    Calendar.all.each do |calendar|
      duplicate_events = calendar.events.group(:uid).count
      next if duplicate_events.count < 2

      found << { calendar_id: calendar.id, source: calendar.source, count: duplicate_events.count }
    end

    found.sort! { |a, b| b[:count] <=> a[:count] }

    found.each do |f|
      puts "calendar: #{f[:calendar_id]}"
      puts "  source: #{f[:source]}"
      puts "  count: #{f[:count]}"
    end

which prints the calendars that have duplicate events

See here for output

ivan-kocienski-gfsc commented 8 months ago

From my local dev snapshot of production I have written a script that lists all the duplicate events

The first line is the event UID, then each row is event created_at, dtstart, dtend, summary and calendar_id

5nesefucdvre4pq56f4u2d18e4@google.com
  2024-03-11 09:18:10 +0000     2024-03-15 17:00:00 +0000     2024-03-15 19:00:00 +0000     Queer Norfolk: Archiving Late      143
  2024-02-20 20:07:54 +0000     2024-03-15 17:00:00 +0000     2024-03-15 19:00:00 +0000     Queer Norfolk: Archiving Late      143
  2024-02-20 20:07:56 +0000     2024-03-15 17:00:00 +0000     2024-03-15 19:00:00 +0000     Queer Norfolk: Archiving Late      143

37r53ov39h6m1n948thtpp61ro@google.com
  2024-03-08 23:00:56 +0000     2024-03-27 11:00:00 +0000     2024-03-27 13:30:00 +0000     S2S - ⚒ Green wood work      39
  2024-02-23 13:03:51 +0000     2024-03-27 11:00:00 +0000     2024-03-27 13:30:00 +0000     S2S - ⚒ Green wood work      39

76ma3pl0c9mqgc6rcs5nmea55o@google.com
  2022-10-18 23:52:21 +0100     2022-10-18 00:00:00 +0100     2022-10-19 00:00:00 +0100     Workbench hire      39
  2022-10-12 15:53:25 +0100     2022-10-18 00:00:00 +0100     2022-10-19 00:00:00 +0100     Workbench hire      39
  2022-10-13 10:03:28 +0100     2022-10-18 00:00:00 +0100     2022-10-19 00:00:00 +0100     Workbench hire      39

3nmemfvb3mnclgsa8cqt2cdauo@google.com
  2024-03-09 23:00:53 +0000     2024-07-20 10:00:00 +0100     2024-07-20 13:00:00 +0100     Powerchair Football      30
  2024-03-06 13:01:36 +0000     2024-07-20 10:00:00 +0100     2024-07-20 13:00:00 +0100     Powerchair Football      30

4ao2d36r8ngo5s04i4v7dhf7fv_R20210809T180000@google.com
  2022-05-04 13:39:54 +0100     2021-12-13 19:00:00 +0000     2021-12-13 21:00:00 +0000     Changes (online)      100
  2022-05-04 13:39:54 +0100     2021-12-13 19:00:00 +0000     2021-12-13 21:00:00 +0000     Changes (in-person)      100

If you delete all the duplicate events (another script I have written) and run all the imports again the duplicate events don't come back.

I included the updated_at to see if these events are "old" and possibly from older code, but some of them look like they where made in the last month or so.

kimadactyl commented 8 months ago

Is this the same bug as #1622?

aaaaargZombies commented 7 months ago

verification

Ivan successfully ran the rake task on staging and I couldn't find any other duplicate events.