HarvardForest / genm

Project organization page for the
MIT License
1 stars 2 forks source link

Data Cleaning/Sorting #134

Open anncalderon opened 3 years ago

anncalderon commented 3 years ago

-Check for any misspellings for species and genus. -Figure out what to do with all the duplicated coordinates (cross-reference years to make sure they are/are not duplicates).

MKLau commented 3 years ago

@anncalderon I'm creating a data cleaning function based on your cleaning operations in plan.R.

MKLau commented 3 years ago

Just added: see https://github.com/anncalderon/gENM/pull/5. Cleaning operations should go into this function, and we will probably want to separate really complicated cleaning and checks into sub-functions.

MKLau commented 3 years ago

Detect and fix/remove water points:

  1. Define boundaries by lat and long ranges, flag points outside range
  2. Get land classification for each point, flag points in a non-sensical class
  3. Check the GPS coordinates for each flagged point
  4. Visually assess locations of flagged points
  5. Use other information to correct positions
  6. Possibly assign points that are within a fixed distance of a sensible land class a new GPS coordinate using the minimum Euclidean distance.

This is probably the database of land/water classification we should use.

https://water.usgs.gov/GIS/huc.html