StirlingCodingClub / studyGroup

Gather together a group to skill-share, co-work, and create community
http://StirlingCodingClub.github.io/studyGroup/
Other
2 stars 1 forks source link

Function for multiple models #19

Open mattnuttall00 opened 5 years ago

mattnuttall00 commented 5 years ago

I'm making my first tentative steps in writing very simple functions just to help keep my scripts tidy, but have hit a snag.

I am trying to write a function that makes running lots of models a bit neater. The models I am running are to estimate the detection function for animals from line transect surveys, using the package 'distance' and the function ds(). The structure of the model call is:

mod1 <- ds(data, truncation, key, formula)

Where data = my data, trunctation = a truncation distance key = the key model function to use (options are uniform, half-normal, hazard rate) formula = simple formula for if you are adding covariates into the model (e.g. formula = ~habitat)

In my models, data and truncation will not change. I wrote the below function to make running a bunch of models slightly neater (although probably not by much!)

detfunc <- function(name,key,covar) {

  name <- ds(distdata, truncation = 50, key=key, formula = ~covar)

  par(mfrow=c(1,2))
  plot(name, showpoints=FALSE, pl.den=0, lwd=2)
  ddf.gof(name$ddf)
  summary(name)
}

The idea being that I can simply write:

detfunc(mod1,"uniform",habitat)
detfunc(mod2,"hn",elevation)
detfunc(mod3,"hr",transect)

etc etc rather than writing the model calls out in full, and the model summary and resulting plots would be spat out.

The function seems to be struggling with the third term: covar though. It throws up and error (from the ds() call rather than from my function) saying that "covar" is not in my dataframe. So it doesn't seem to be recognising the third term in my function call. I tried adding

covar <- data$covar

at the top of the function to try and assign the term to a column in my dataframe but that hasn't worked.

Can someone offer any advice? If you think this is a pointless use of a function, I am open to that advice too ;)

Matt

bradduthie commented 5 years ago

I've downloaded the Distance package, just to see how everything works. Looks like the output should be fine to do what you want.

One issue is that distdata isn't defined as an argument in your function, so the first line in the function will look for this from the global environment and assign it to name. How do things work when you change the name argument like the below?

detfunc <- function(distdata, key, covar) {

  name <- ds(distdata, truncation = 50, key=key, formula = ~covar)

  par(mfrow=c(1,2))
  plot(name, showpoints=FALSE, pl.den=0, lwd=2)
  ddf.gof(name$ddf)
  summary(name)
}

Does this give you the same error message?

mattnuttall00 commented 5 years ago

Hey Brad,

Yea I assumed it would be able to look in the global environment for distdata, and just use it normally, but perhaps not.

I changed it as per your example, but wasn't exactly sure whether when testing it I was then supposed to use distdata as the first argument or the name of the model. So I tried both.

When I do detfunc(distdata,"hn",habitat) I get the error:

Variable(s): covar are in the model formula but not in the data.

And when I try detfunc(mod1,"hn",habitat) I get the error:

object 'mod1' not found

I have also just tried to add a dat argument:

detfunc <- function(dat,name,key,covar) {

  name <- ds(dat, truncation = 50, key=key, formula = ~covar)

  par(mfrow=c(1,2))
  plot(name, showpoints=FALSE, pl.den=0, lwd=2)
  ddf.gof(name$ddf)
  summary(name)
}

Followed by detfunc(distdata,mod1,"hn",habitat)

But get the same error: variable(s): covar are in the model formula but not in the data

bradduthie commented 5 years ago

Thanks @mattnuttall00 -- you're correct that distdata could be found in the global environment, but it's generally a good idea to include everything required by the function as an argument just to make the function self-contained. The argument name wasn't doing anything because it was immediately overwritten by the first line of code.

Sorry, I should have specified what I was doing better! By changing to distdata, I meant to make your function detfunc read in whatever ds needs in its first argument. With this way, it doesn't matter what this data set is named outside the function, ds will still recognise what is needed and run it accordingly. For example:

detfunc(distdata = data_I_want_ds_to_use, key, covar);

You could also just call data_I_want_ds_to_use the same thing, as below.

detfunc(distdata = distdata, key, covar);

But your detfunc won't care either way -- it will happily take whatever you specify in the argument and apply it to the ds function.

In the redefined function you sent (including dat), the argument name still isn't doing anything. If you removed it, then the function would do the exact same thing because you are assigning name to the result of ds() and name isn't used in this assignment (i.e., the right hand side).

I'm not quite sure I understand what the formula argument is doing, but it looks like ds is having a hard time finding it as a column in distdata (this is what I think is meant by in the model formula but not in the data). Are you specifying a column of data here?

mattnuttall00 commented 5 years ago

Thanks @bradduthie,

Right I see, the distdata bits make sense. John also mentioned to me about keeping functions self contained...it just doesn't seem to be sinking in!

In terms of name, in order to assign a model name would you suggest then doing this outside the function? e.g.

mod1 <- detfunc(distdata,"hn",habitat)

So the formula bit is where in ds() you are able to include covariates in the model. It's really simple to do normally. For example, the below code runs absolutely fine:

modtest <- ds(distdata, truncation=50, key = "hn", formula=~habitat)

So for some reason the covar argument in my function is not being recognised. Any ideas why?

M

bradduthie commented 5 years ago

No worries @mattnuttall00! I'm trying to verbalise what I think is the general function-related issue, but I'm not quite sure how to phrase it in a way that makes sense to me. I'm hoping the below will help.

Your line showing modtest really helps. What I suspect is happening is that ds is looking for the name of a column in a formula, but you're reading in the column itself and ds is confused by the ~. Instead, I think you will need to specify the formula itself as an argument, or do some very crafty pasting within the function to get the format correct (let me know if you need to go this route for some reason). Try the below.

detfunc <- function(dat, key, trnc = 50, covar) {

  name <- ds(dat = dat, truncation = trnc, key = key, formula = covar)

  par(mfrow=c(1,2))
  plot(name, showpoints=FALSE, pl.den=0, lwd=2)
  ddf.gof(name$ddf)
  summary(name)
}

But then run the function as below.

detfunc(dat = distdata, key = "uniform", trnc = 50, covar = ~habitat)

Does that work okay?

mattnuttall00 commented 5 years ago

Eureka! That worked. I never even suspected that the culprit was the ~

Many thanks @bradduthie , much appreciated!

bradduthie commented 5 years ago

It seems like it's always the most subtle possible thing that causes the most critical problem :-) -- glad that worked, @mattnuttall00!