Closed hmaskat17 closed 1 year ago
Just to recap, it seems new ID's were created for some Jena events and is why MMT imported them as new events and the app is showing duplicates.
Any ideas on how this can be solved so that in the future these events with new ID's do not show up in MMT. Can this be easily fixed in Hub side or should we try fixing this in MMT side?
@ec2u/mmt
We're reviewing the matter.
We update the id generation rules around mid May and that could account for some inconsistencies/leftovers.
However, events published on multiple calendars would be expected to collapse, which apparently is not the case.
Back to you as soon as we have a fix.
I reviewed the matter on KH side; the only relevant change was an upgrade to the id generation algorithm on 2022-05-19:
The new logic prevents ids from changing on event rescheduling and collapses events published on multiple calendars with a unique external URL.
That said, I don't think we have active issues on KH:
duplicated events ingested before 2022-05-19 are just leftovers not yet picked up by the reaping algorithm, as their dates are still in the future
URL changes on the source account for different KH ids between June 2nd and July 16th (remember that KH ids are generated as a hash of the publishing URL)
events ingested on July 16th are not collapsed because the same physical event is published multiple times by different calendars with distinct URLs (which again generates different hashes)
The same reasoning applies as well to the additional example.
My suggestion is that we manually remove MMT leftovers and then keep the situation monitored for similar future issues.
On our side, we'll keep you posted should we upgrade id generation logic on other sources.
Steps to reproduce
What did you expect to happen? There should be the same number of events for MMT and KH
What did actually happen? MMT has several more events that KH
Would you share any observation or additional context about the bug? Example > Science Battle:
MMT imported seven Science Battle events:
/events/c26450a98ad2a791bedc6eb0f4804ad2
/events/17d430cafc6ffa04cb90fb383aed827a
and/events/ba2661b519af9cda05b1fadda230b564
and/events/eb14ff90b4b44bfa6d10dd43f63034b7
/events/b3416fd4958d3cc358c00fb961491251
and/events/b3a9d6a2092ae5e058d0fe2b3083f74c
and/events/f5b499a8795d70915e45da910ca293b3
All of the above have a unique ID field in MMT even though they are the same events of a group of 3 individual events. So all three events imported July 16th are found in KH - but these are duplicates of the June 2nd and April 20th events in MMT.
The question is why do the events imported July 16th have new unique ID's compared to the ones imported June 2nd?
I checked and found the events from July 16th have updated source URLs.
Another example with the same problem is Workshop: Vorbereitung auf Jena (Deutsch) > https://data.ec2u.eu/events/725a6dd33c549950919d5c1721d0f7cd
Screenshots
From MMT app:![image](https://user-images.githubusercontent.com/32770092/179353474-d7aa9bee-bf93-4832-87d8-a5f12df9d286.png)
@ec2u/mmt