USEPA / spsurvey

spsurvey: Spatial Sampling Design and Analysis in R
https://usepa.github.io/spsurvey/
GNU General Public License v3.0
15 stars 5 forks source link

"Error in UPpivotal(sites$ip) : there are missing values in the pik vector" #42

Open kaitlynstrickfaden opened 4 months ago

kaitlynstrickfaden commented 4 months ago

Not sure what this error means. I get it when I run grts(). I'm using an SF object in a projected coordinate system. I get the error when trying to run a caty_var and caty_n, but I no longer get the error when I comment out those two lines and just use sframe and n_base.

jasonelaw commented 4 months ago

Not an expert on the current code - but UPpivotal is the function that actually selects the sample, the sites$ip vector is the site inclusion probabilities, and it looks like the error is happening when you're selecting a stratified random sample. Do you have missing values or other issues with the stratification variable - caty_var?

kaitlynstrickfaden commented 4 months ago

Thanks for your quick response. No, I don't have any NAs in my data. Some more background if it's helpful: I've split my study area into 1-km grid cells using st_make_grid and then selected only those grid cells that occur within our different subherd boundaries. The resulting sf data consists of 322 polygons. I then create a "Subherd" column based on which of the subherd boundaries that grid cell occurs in. This Subherd column is what I'm trying to use as my caty_var, because I need some grid cells in each of our subherd boundaries.

jasonelaw commented 4 months ago

I would try to create a reproducible example so that someone can debug the code and see what is happening. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for ways to do that. Once, you have a small, runnable chunk of code that gives the error, copy it here.

jasonelaw commented 4 months ago

I can reproduce that error with the following code:

library(spsurvey)
data("NRSA_EPA7")

ret <- grts(
  sframe   = NRSA_EPA7, 
  caty_var = "STATE", 
  caty_n   = c("Missouri" = 2, "Kansas" = 2, "Missouri" = 2, "Nebraska" = 2), 
  n_base   = 8
)

Where caty_n values are not provided for every state - I've dropped Iowa. Check the names of your caty_n vector and ensure you have values for every value of caty_var. Adding Iowa fixes the issue:

ret <- grts(
  sframe   = NRSA_EPA7, 
  caty_var = "STATE", 
  caty_n   = c("Missouri" = 2, "Kansas" = 2, "Missouri" = 2, "Nebraska" = 2, "Iowa" = 2), 
  n_base   = 10
)
kaitlynstrickfaden commented 4 months ago

Ohh, I didn't notice the caty_n vector had to be a named vector. Sorry, that was a simple "look at the R documentation, dummy" error on my pat. The error message was really hard to decipher, so I hope you'll forgive me. I was able to get my code to work by making caty_n a named vector. Thank you!

jasonelaw commented 4 months ago

No problem. Glad you got it working. spsurvey can be a bit unconventional in the manner that the functions accept arguments - they require you to do a lot of matching up of names which is very easy to get wrong.

michaeldumelle commented 4 months ago

Thanks @kaitlynstrickfaden and @jasonelaw for the discussion and @jasonelaw for providing the help! I will plan to add a check in the next version of spsurvey (the current version is 5.5.1) that returns an informative error message when there are any values in caty_var that are not contained in the names of caty_n.

I'll post on this thread when the check has been implemented!