Clip training data by the HUC2 watersheds that contain training points

PNHP / Regional_SDM

Methods and collaboration for Species Distribution Modeling among Heritage Programs

3 stars 0 forks source link

Clip training data by the HUC2 watersheds that contain training points #29

Closed ChristopherTracey closed 6 years ago

ChristopherTracey commented 6 years ago

In order to prevent "overprediction" into major watersheds where a species does not occur, I would like to automatically subset the EnvVars by a HUC2 (or HUC4?) watershed based on the training data.

ChristopherTracey commented 6 years ago

Implemented in #30. Tested, appears to work. We should make this an option to run and at what HUC level in user_run_SDM.R.

Also, note that this required the stringr package for string padding, added it to run_SDM.R

Still super slow on the jackknifing procedure as note in #24

ChristopherTracey commented 6 years ago

Whoops. Didn't realized bgpoints_clean.csv was created in 1_pointsInPolys_cleanBkgPts.R---need to modify.

ChristopherTracey commented 6 years ago

Fixed in a5eef379956c8a55b869faa6d8facb6f241cd4b9.

dnbucklin commented 6 years ago

I can integrate this into run_SDM as an option; I'm thinking we'd want to provide a digit indicating the number of HUC digits to use to identify the model domain.

I'll also play around with sampling a to reduce the background file size.

ChristopherTracey commented 6 years ago

Yes, a digit option would be great.

dnbucklin commented 6 years ago

I just added the argument huc_level in 47339a5719f338fcacda6b150900eb01938099c9; tested and working for me.

I modified the prediction output and metadata map to only include this area as well.

ChristopherTracey commented 6 years ago

Tested as well and looks like it worked overall.

Found a (reintroduced?) bug due to a trailing slash in one of the loc variables. Fixed in aa3caa52fb936d28b208337d5b7a85bdf04b0dae