VertNet / gulo

Shredding Darwin Core Archives with ferocity, strength, and Cascalog.
7 stars 5 forks source link

lat-lon processing #4

Closed tucotuco closed 10 years ago

tucotuco commented 12 years ago

Before creating index on unique lat-lon, preprocess the verbatim input to create seven-digit precision string version (including rounding) and remove invalid lat-lon pairs (e.g., lat or lon invalid). For VertNet, also include the condition where lat and lon both zero as an invalid value. This will trap inputs that have 0 for null. VertNet is extremeley unlikely to have any real data from lat=0, lon=0, while it is likely to have a lot of data from these false values.

eightysteele commented 12 years ago

Thanks man, assigned the issue to myself.

tucotuco commented 10 years ago

Let's not do this in gulo. It's not the place for data quality improvement.