codeformuenster / muenster-info-hub

"Münster-Info-Hub" - Ein Projekt vom MÜNSTERHACK 2019
https://www.muenster.jetzt/
Apache License 2.0
4 stars 0 forks source link

Deduplicate w.r.t. data from other pages #29

Open JonesH opened 5 years ago

JonesH commented 5 years ago

Events should be deduplicated wrt events scraped from other pages

JonesH commented 5 years ago

This would enable merging data like images etc

JonesH commented 4 years ago

using generated IDs should solve this?

nichoio commented 4 years ago

Only partly. IDs donÄt solve problem of data from separate sources referring to identical event. I'm working on a simple deduplication feature right now. But it's work in progress so nothing to push right now.

nichoio commented 4 years ago

But we'll need the event based IDs anyway so let's add them as well :)

JonesH commented 4 years ago

Yes, there's two kinds of deduplication:

Only the first one would be solved by generated IDs