Closed MatthiasBaur closed 8 years ago
@MatthiasBaur In the current data provided through the demo there could be duplicates. But for the data which was gathered on the rostlab machines there should be no duplicates.
Thanks for the info.
is a sighting reported by two different people but of the same pokemon at the same location and at the same time is considered to be a duplicate, @jonas-he ?
@goldbergtatyana - If they are from the same data source, they are considered duplicate and hence only the first one is considered. If they are from different data sources, then both are retained in the db.
thanks @vivek-sethia , so we might have duplicates in that sense. What about sightings of the same pokemon at the same time, but at two different locations (with a distance of few meters from each other)? This would also be two duplicate entries that would probably also be contained in the database?
@goldbergtatyana You are right, that will also be contained in the database, since exact locations ( lat, long) are considered.
FYI @PokemonGoers/predictpokemon-2
@vivek-sethia I thought that data source does not matter, only place, time and pokemonId as this is what is in /app/models/pokemonSighting.js
:
pokemonSighting.index({"appearedOn": -1, "pokemonId": 1, "location": 1}, {"unique": true});
@goldbergtatyana Got bit confused. It will only check for time, location, and pokemon id
and hence there won't be duplicates from different sources. Thanks for the correction @jonas-he . However, the current data from mlab does not have this index. So there will be duplicates. Once we have the data coming from Rostlab server, it should be fine.
Ok so, if time, pokemon_id will be the same for two entries but their locations will be a bit different (in say 1 meter distance from each other), then these would be other duplicates that we will have. It is ok this way though!
@PokemonGoers/predictpokemon-2 team will be drawing randomly 10K sightings from the data of the last few days. Using this approach it is very unlikely that we will catch duplicates.
Hi! Do you have a filter implemented to weed out identical datapoints? Team Predict