Open dokterbob opened 11 years ago
Unfortunately, I can't include attachments other than images here. I have a link to three different representations of, basically, the same information: https://rejo.zenger.nl/files/newspeak-example-duplication.zip. At least one of the two representations of officielebekendmakingen.nl appeared in the feed https://zoek.officielebekendmakingen.nl/kamervragen_aanhangsel/rss, the publication at rijksoverheid.nl appeared in http://feeds.rijksoverheid.nl/kamerstukken.rss.
@rejozenger As RSS is changing, please in the future attach the actual RSS resource (or a dump thereof). The current state of the RSS feed does not provide any reference to the mentioned documents.
Note about documents: the versions on 'officiëlebekendmakingen' have corresponding 'Aanhangselnummer'. This could be a lead to preliminary deduplication.
I have uploaded a compressed directory to https://rejo.zenger.nl/files/newspeak.zip. It includes three directories with, for each, a dump of the RSS feed in which the item appeared, a saved version of the HTML file the RSS feed was pointing to and saved PDF file the HTML file was pointing to. It also includes "items-seen.txt" which shows all of the occurrences over time. I haven't investigated, but I am pretty sure not all nine version did appear in the RSS feed (using the legacy code).
Sometimes, feed items are duplicated across streams. As to avoid new notifications for existing information, some infrastructure for deduplication has to be figured out.
@rejozenger Examples, please. ^^