A tool designed to help the creators of vaccine finder tools easily manipulate, combine, and move data to allow everyone to have better access to data about vaccination sites during the pandemic
GeoCompare needs a way to be able to sort through two potentially-large datasets and match up their items based on the coordinates. This module should not need to know the specifics of the structure of each item as that should be handled by the parsing classes (see #9),
Here are some ideas for ways this could be done:
brute force. use a coordinate-aware algorithm to determine the actual distance between two coordinates.
read one dataset in first and use something like geohash (https://github.com/dbarthe/geohash/) or similar to "sort" the records by location (this may be a little complex). then items from the second database can be used to find locations that are closest/most similar to the ones in the first and use that to either match directly or severely limit the list of matching locations that need to be processed by brute force.
This module will need to be able to take two data sources (in the form of Source objects as defined by models/source.py) and use the specified parser for each source to:
match the records between the sources by coordinate proximity
return the following lists of associations (#17) between entries
entries present in the "source" database and not present in "target"
entries present in both "target" and "source" where all fields common to both are an exact match
entries present in both "target" and "source" where there are differences between at least one field
entries present in "target" but not present in "source"
GeoCompare needs a way to be able to sort through two potentially-large datasets and match up their items based on the coordinates. This module should not need to know the specifics of the structure of each item as that should be handled by the parsing classes (see #9),
Here are some ideas for ways this could be done:
This module will need to be able to take two data sources (in the form of
Source
objects as defined bymodels/source.py
) and use the specified parser for each source to: