flr / FLa4a

The repository for the JRC initiative on stock assessment
https://fishreg.jrc.ec.europa.eu/web/a4a
12 stars 6 forks source link

convert data.frame to FLStock #128

Open GeoTuxMan opened 5 years ago

GeoTuxMan commented 5 years ago

Hi,

I try to do a stock assement with sca() method. After I convert my data.frame to FLStock object : mystock<-as.FLStock(land_t) the command: summary(mystock) give me only values of one (1).

My all code: `#stock assesment

library(FLCore) library(ggplotFL) library(FLa4a) library(xlsx) library(plotly)

dat=read.xlsx("Sprot_FLR.xlsx", sheetIndex = 1)

dat<-data.frame(dat$slot,dat$age, dat$year, dat$data, dat$units) dat2<-na.omit(dat)

select only landings in tone

land_t <- na.omit(subset(dat, dat2$dat.slot=="landings", select=-dat.slot)) colnames(land_t)<- c("age","year","data","units") landst <- as.FLQuant(land_t) summary(landst) plot(landst)

plot1=data.frame(land_t)

qplot(x=year, y=data, data=plot1, color=factor(age),geom=c("point","line"),ylab="Catch (t)", xlab="")

p1<-ggplot(plot1, aes(x=year, y=data, group = age)) + geom_line(aes(colour=as.factor(age))) + ylab("Catch (t)") + xlab("") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) ggplotly(p1)

select only landings in numbers

land_n <- na.omit(subset(dat, dat2$dat.slot=="landings.n", select=-dat.slot)) colnames(land_n)<- c("age","year","data","units") landsn <- as.FLQuant(land_n) summary(landsn) plot(landsn) plot2=data.frame(land_n)

qplot(x=year, y=data, data=plot2, color=factor(age),geom=c("point","line"),ylab="Catch (1000^3)", xlab="")

p2<-ggplot(plot2, aes(x=year, y=data, group = age)) + geom_line(aes(colour=as.factor(age))) + ylab("Catch (1000^3)") + xlab("") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) ggplotly(p2)

convert dataframe to FLStock

mystock<-as.FLStock(land_t) summary(mystock) # empty ?!

add an index of abundance : cpue

the statistical catch-at-age model

fit <- sca(mystock, cpue)

update FLStock object

stk<-mystock+fit

plot(stk)

plot fitted vs observed catch-at-age

plot(fit, ple4)

` Sprot_FLR.xlsx I look forward to some tips ... Cheers !

iagomosqueira commented 5 years ago

The table in the XLSX file has data labelled as "landings" and "landings.n". But landings in an FLStock refers to total landings in weight, and not by age. We usually have landings.n, in number, and landings.wt, the mean weight at age in the landings, usually in kg.

From simply the landings.n data, you could create an FLStock using

library(FLCore)
library(xlsx)

dat <- na.omit(read.xlsx("Sprot_FLR.xlsx", sheetIndex = 1, stringsAsFactors=FALSE))
stk <- as(dat[dat$slot == "landings.n",], "FLStock")

If you add other data to the same table, you can add it to the FLStock using the same code.

Maybe take a look at the example dataset for the class, data(ple4) to see what needs to go into each slot

iagomosqueira commented 5 years ago

Also, to use FLa4a::sca() you need to provide at least:

And one or more indices of abundance, as objects of class FLIndex or FLIndices

GeoTuxMan commented 5 years ago

Dear Iago,

Thank you very much for your reply. Now, summary(stk) work fine. If discards is equal with 0, can I use my landings.n like catch.n ?

Cheers ! George

În vin., 15 feb. 2019 la 11:56, Iago Mosqueira notifications@github.com a scris:

Also, to use FLa4a::sca() you need to provide at least:

  • catch.n, usually a sum of landings.n and discards.n
  • catch.wt
  • m
  • mat
  • stock.wt
  • harvest.spwn and m.spwn

And one or more indices of abundance, as objects of class FLIndex or FLIndices

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/flr/FLa4a/issues/128#issuecomment-463976158, or mute the thread https://github.com/notifications/unsubscribe-auth/APeMazJQUMI0jRzNsV7Wr1mEP2OB-mybks5vNoQ0gaJpZM4a2YPY .

iagomosqueira commented 5 years ago

You should do

discards.n(stk) <- 0
discards.wt(stk) <- landings.wt(stk)
catch(stk) <- computeCatch(stk, "all")

or simply load the data into the catch.n slot. And then add all the other biological information.

GeoTuxMan commented 5 years ago

For "index of abundance", I use CPUE; my file look like that: year index 1980 384 1981 2775 ............ How can I load this data into an FLIndex object ?

În vin., 15 feb. 2019 la 12:27, Iago Mosqueira notifications@github.com a scris:

You should do

discards.n(stk) <- 0 discards.wt(stk) <- landings.wt(stk) catch(stk) <- computeCatch(stk, "all")

or simply load the data into the catch.n slot. And then add all the other biological information.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/flr/FLa4a/issues/128#issuecomment-463987850, or mute the thread https://github.com/notifications/unsubscribe-auth/APeMa10Yq9DAjc-O7V64MxTiVSXlVv0Yks5vNouLgaJpZM4a2YPY .

iagomosqueira commented 5 years ago

Is this a biomass index? Or is it by age?

Generally, take a look at

http://www.flr-project.org/doc/Loading_your_data_into_FLR.html

for how to load data into FLR.

GeoTuxMan commented 5 years ago

Dear Iago,

My CPUE is "kg/fishing day" by year. I try this code:

indices<-read.xlsx("Sprat_Index.xlsx",sheetIndex = 1, stringsAsFactors=FALSE)

indices<-as.FLQuant(as.matrix(indices), dimnames=list(year=1981:2007,age=0:4))

The error:

indices<-as.FLQuant(as.matrix(indices), dimnames=list(year=1981:2007,age=0:4))Error in array(object, dim = dim, dimnames = filldimnames(dimnames, dim = dim)) : length of 'dimnames' [1] not equal to array extent

I use "Loading your data into FLR", pag.8

All the best, George

În vin., 15 feb. 2019 la 14:26, Iago Mosqueira notifications@github.com a scris:

Is this a biomass index? Or is it by age?

Generally, take a look at

http://www.flr-project.org/doc/Loading_your_data_into_FLR.html

for how to load data into FLR.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/flr/FLa4a/issues/128#issuecomment-464031166, or mute the thread https://github.com/notifications/unsubscribe-auth/APeMa1ICtAZ3JxKdEU-fnweAu9I08RjXks5vNqeNgaJpZM4a2YPY .

iagomosqueira commented 5 years ago

What are the dimensions of the result of as.matrix(indices)?

dim(as.matrix(indices))

should be c(5, 27). First dimension is age, second year.

With regards the data, a biomass index won't have enough information to fit sca(). The method assumes an index at age is available, covering at least some of those ages, like from a survey.

GeoTuxMan commented 5 years ago

Dear Iago,

dim(as.matrix(indices))[1] 27 2

You mean, the file need to be like that ?: year age index 1980 0 123 1980 1 234 1980 2 345 1980 3 456 1980 4 101 1981 0 233 ............................ All the best ! George

În lun., 18 feb. 2019 la 18:47, Iago Mosqueira notifications@github.com a scris:

What are the dimensions of the result of as.matrix(indices)?

dim(as.matrix(indices))

should be c(5, 27). First dimension is age, second year.

With regards the data, a biomass index won't have enough information to fit sca(). The method assumes an index at age is available, covering at least some of those ages, like from a survey.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/flr/FLa4a/issues/128#issuecomment-464806215, or mute the thread https://github.com/notifications/unsubscribe-auth/APeMa_SUulqUnlWc028yEiVlKLrrRoFRks5vOtkIgaJpZM4a2YPY .

iagomosqueira commented 5 years ago

It all depends on what information you have. sca() does better when the index of abundance has information on the cohorts' changes in abundance, say from survey at age. If you only have a biomass index, like commercial CPUE, then

GeoTuxMan commented 5 years ago

I find some survey indices at age. Now, I have this error:

fit <- sca(stk, indices)Error in a4aInternal(fmodel = fmodel, qmodel = qmodel, srmodel = srmodel, :

only non-zero survey indices allowed.

It's need to replace zero with NA ?

În mar., 19 feb. 2019 la 10:38, Iago Mosqueira notifications@github.com a scris:

It all depends on what information you have. sca() does better when the index of abundance has information on the cohorts' changes in abundance, say from survey at age. If you only have a biomass index, like commercial CPUE, then

  • Pass to sca an object of class FLIndexBiomass, created using FLIndexBiomass(index=FLQuant()) where that FLQuant was created as above.
  • You might need to set the n1model, the initial abundances at age, as sca will have little information to estimate it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/flr/FLa4a/issues/128#issuecomment-465038546, or mute the thread https://github.com/notifications/unsubscribe-auth/APeMawfWy50kDuxThbK0oFd4B0nQa278ks5vO7f9gaJpZM4a2YPY .

0 0 0 0 0 0 0 0 0 0 0 1.999702558 0 0 0 0 0 4.090467626 24.59369527 0 0 0 0 0 0 22.17689243 0 6.942361111 100.8988914 104.6245942 205.4512046 651.3813491 535.2796209 17.89448441 308.6765189 185.2005105 127.3622926 101.0953627 110.1317668 110.082504 12.16489162 6.836087842 12.23692951 0 15.71600719 253.4737303 32.04131227 72.06232687 4.561478599 13.42921914 54.195232 66.31854545 87.92669323 72.97230769 557.7030093 339.1920177 351.7167208 880.7008303 32.96694437 503.5157752 380.0788489 487.0410509 292.2161089 200.9568627 89.00261593 184.9724866 201.8776864 128.8718206 94.79375141 71.34752122 56.13926358 371.0843525 342.0903678 467.09113 331.6038781 127.6290632 74.96146096 80.4232 95.33832727 36.07649402 88.45846154 1258.881481 976.7871397 1012.855114 230.1053491 147.7828541 130.5846987 302.7746763 14.66009852 8.795802609 6.048868778 51.756956 63.24985128 44.4632915 48.08933718 50.28322391 20.11834174 49.58439665 326.7350719 173.1872154 212.8967193 182.2063712 52.48470518 33.91550518 17.454528 52.40465455 9.839043825 47.12615385 488.2793981 729.9068736 756.8587662 53.41731319 7.957538296 7.058632363 15.03136691 4.072249589 2.443278503 1.680241327 0 13.51650803 2.151449589 0.950382158 0 0 0 0 0 0 0 0 0 1.84704 2.665745455 0.156175299 11.13538462

iagomosqueira commented 5 years ago

A4a works in log space, so zeroes are a problem. Try substituting them with a small number, say 1e-6

--- Sent from a mobile

On 19 feb 2019 at 12:27 PM, wrote:

I find some survey indices at age. Now, I have this error:

fit <- sca(stk, indices)Error in a4aInternal(fmodel = fmodel, qmodel = qmodel, srmodel = srmodel, :

only non-zero survey indices allowed.

It's need to replace zero with NA ?

În mar., 19 feb. 2019 la 10:38, Iago Mosqueira notifications@github.com a scris:

It all depends on what information you have. sca() does better when the index of abundance has information on the cohorts' changes in abundance, say from survey at age. If you only have a biomass index, like commercial CPUE, then

0 0 0 0 0 0 0 0 0 0 0 1.999702558 0 0 0 0 0 4.090467626 24.59369527 0 0 0 0 0 0 22.17689243 0 6.942361111 100.8988914 104.6245942 205.4512046 651.3813491 535.2796209 17.89448441 308.6765189 185.2005105 127.3622926 101.0953627 110.1317668 110.082504 12.16489162 6.836087842 12.23692951 0 15.71600719 253.4737303 32.04131227 72.06232687 4.561478599 13.42921914 54.195232 66.31854545 87.92669323 72.97230769 557.7030093 339.1920177 351.7167208 880.7008303 32.96694437 503.5157752 380.0788489 487.0410509 292.2161089 200.9568627 89.00261593 184.9724866 201.8776864 128.8718206 94.79375141 71.34752122 56.13926358 371.0843525 342.0903678 467.09113 331.6038781 127.6290632 74.96146096 80.4232 95.33832727 36.07649402 88.45846154 1258.881481 976.7871397 1012.855114 230.1053491 147.7828541 130.5846987 302.7746763 14.66009852 8.795802609 6.048868778 51.756956 63.24985128 44.4632915 48.08933718 50.28322391 20.11834174 49.58439665 326.7350719 173.1872154 212.8967193 182.2063712 52.48470518 33.91550518 17.454528 52.40465455 9.839043825 47.12615385 488.2793981 729.9068736 756.8587662 53.41731319 7.957538296 7.058632363 15.03136691 4.072249589 2.443278503 1.680241327 0 13.51650803 2.151449589 0.950382158 0 0 0 0 0 0 0 0 0 1.84704 2.665745455 0.156175299 11.13538462

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

GeoTuxMan commented 5 years ago

OK, many thanks.

Now....

fit <- sca(stk, indices)Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : A term has fewer unique covariate combinations than specified maximum degrees of freedom

În mar., 19 feb. 2019 la 13:58, Iago Mosqueira notifications@github.com a scris:

A4a works in log space, so zeroes are a problem. Try substituting them with a small number, say 1e-6

--- Sent from a mobile

On 19 feb 2019 at 12:27 PM, wrote:

I find some survey indices at age. Now, I have this error:

fit <- sca(stk, indices)Error in a4aInternal(fmodel = fmodel, qmodel = qmodel, srmodel = srmodel, :

only non-zero survey indices allowed.

It's need to replace zero with NA ?

În mar., 19 feb. 2019 la 10:38, Iago Mosqueira notifications@github.com a scris:

It all depends on what information you have. sca() does better when the index of abundance has information on the cohorts' changes in abundance, say from survey at age. If you only have a biomass index, like commercial CPUE, then

  • Pass to sca an object of class FLIndexBiomass, created using FLIndexBiomass(index=FLQuant()) where that FLQuant was created as above.
  • You might need to set the n1model, the initial abundances at age, as sca will have little information to estimate it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/flr/FLa4a/issues/128#issuecomment-465038546, or mute the thread < https://github.com/notifications/unsubscribe-auth/APeMawfWy50kDuxThbK0oFd4B0nQa278ks5vO7f9gaJpZM4a2YPY

.

0 0 0 0 0 0 0 0 0 0 0 1.999702558 0 0 0 0 0 4.090467626 24.59369527 0 0 0 0 0 0 22.17689243 0 6.942361111 100.8988914 104.6245942 205.4512046 651.3813491 535.2796209 17.89448441 308.6765189 185.2005105 127.3622926 101.0953627 110.1317668 110.082504 12.16489162 6.836087842 12.23692951 0 15.71600719 253.4737303 32.04131227 72.06232687 4.561478599 13.42921914 54.195232 66.31854545 87.92669323 72.97230769 557.7030093 339.1920177 351.7167208 880.7008303 32.96694437 503.5157752 380.0788489 487.0410509 292.2161089 200.9568627 89.00261593 184.9724866 201.8776864 128.8718206 94.79375141 71.34752122 56.13926358 371.0843525 342.0903678 467.09113 331.6038781 127.6290632 74.96146096 80.4232 95.33832727 36.07649402 88.45846154 1258.881481 976.7871397 1012.855114 230.1053491 147.7828541 130.5846987 302.7746763 14.66009852 8.795802609 6.048868778 51.756956 63.24985128 44.4632915 48.08933718 50.28322391 20.11834174 49.58439665 326.7350719 173.1872154 212.8967193 182.2063712 52.48470518 33.91550518 17.454528 52.40465455 9.839043825 47.12615385 488.2793981 729.9068736 756.8587662 53.41731319 7.957538296 7.058632363 15.03136691 4.072249589 2.443278503 1.680241327 0 13.51650803 2.151449589 0.950382158 0 0 0 0 0 0 0 0 0 1.84704 2.665745455 0.156175299 11.13538462

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/flr/FLa4a/issues/128#issuecomment-465101888, or mute the thread https://github.com/notifications/unsubscribe-auth/APeMay8qm3Ih_mtydBhP-vJgNFoU4fPgks5vO-bvgaJpZM4a2YPY .

iagomosqueira commented 5 years ago

The a4a model is built around a series of submodels, see http://www.flr-project.org/doc/Statistical_catch_at_age_models_in_FLa4a.html foir some explanation and examples.

A call top sca() builds a default model according to the dimensions of the data provided, but it is never fully appropriate. You will need to look at each of the submodels and constrcut those that are right for your data. For example, trhe default srmodel is a factor per year, but you could assume a given SRR, say bevholt.

GeoTuxMan commented 5 years ago

fit <- sca(stk, indices, fmodel=fmod, qmodel=qmod, srmodel=srmod)This version of C:/Program Files/R/R-3.4.1/library/FLa4a/bin/windows\a4a.exe is not compatible with the version of Windows you're running. Check your computer's system information to see whether you need a x86 (32-bit) or x64 (64-bit) version of the program, and then contact the software publisher.Error in file(file, "r") : cannot open the connectionIn addition: Warning messages:1: running command 'C:\Windows\system32\cmd.exe /c cd /D"C:\Users\User\AppData\Local\Temp\RtmpWIcr3K\file191c6afd982" & a4a > logfile.txt' had status 1 2: In shell(paste0("cd /D", shQuote(wkdir), " & a4a ", args, " > logfile.txt")) : 'cd /D"C:\Users\User\AppData\Local\Temp\RtmpWIcr3K\file191c6afd982" & a4a > logfile.txt' execution failed with error code 13: In file(file, "r") : cannot open file 'C:\Users\User\AppData\Local\Temp\RtmpWIcr3K\file191c6afd982/a4a.par': No such file or directory

I have Windows 7 32-bit, RStudio 1.0.153, R 3.4.1, FLa4a 1.6.4 Where do I find FLa4a for 32 bit ?

În mar., 19 feb. 2019 la 14:16, Iago Mosqueira notifications@github.com a scris:

The a4a model is built around a series of submodels, see http://www.flr-project.org/doc/Statistical_catch_at_age_models_in_FLa4a.html foir some explanation and examples.

A call top sca() builds a default model according to the dimensions of the data provided, but it is never fully appropriate. You will need to look at each of the submodels and constrcut those that are right for your data. For example, trhe default srmodel is a factor per year, but you could assume a given SRR, say bevholt.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/flr/FLa4a/issues/128#issuecomment-465106808, or mute the thread https://github.com/notifications/unsubscribe-auth/APeMa8BCEd61osRlXTn2-GuJLQqAb2pkks5vO-sdgaJpZM4a2YPY .

iagomosqueira commented 5 years ago

We are releasing a new version of FLa4a this week, and I will compile a 32bit version. The current one is a bit outdated.

GeoTuxMan commented 5 years ago

Dear Iago,

Please let me know when you finish to compile the new version 32bit. In the meantime, I move to a laptop with Ubuntu 64bit. Now, I have new error:

range(indices[[1]])[c('startf', 'endf')] <- c(0.25,0.75) #March=3/12=0.25> qmod<-list(~factor(age))> fmod<-~factor(age)+factor(year)> srmod<-~factor(year)> fit <- sca(stk, indices, fmodel=fmod, qmodel=qmod, srmodel=srmod)Warning message:In fitADMB(fit, wkdir, df.data, stock, indices, full.df, fbar, plusgroup, : Hessian was not positive definite.> stkx<-stk+fit> plot(stkx)Error: Faceting variables must have at least one value

All the best ! George

În mar., 19 feb. 2019 la 17:30, Iago Mosqueira notifications@github.com a scris:

We are releasing a new version of FLa4a this week, and I will compile a 32bit version. The current one is a bit outdated.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/flr/FLa4a/issues/128#issuecomment-465206361, or mute the thread https://github.com/notifications/unsubscribe-auth/APeMaxviTDpg0X8EK12fk3aJoAav1Mniks5vPCaygaJpZM4a2YPY .

iagomosqueira commented 5 years ago

The message about the Hessian indicates that the model did not fit properly. So I assume the fit object has not updated the stock as required by the plot method. Take a look at the residuals and the results of the fit.

GeoTuxMan commented 5 years ago

Dear Iago,

The commands:

res<-residuals(fit,stk,indices) plot(res)

show an empty plot....what this means ?

All the best ! George

În mie., 20 feb. 2019 la 16:20, Iago Mosqueira notifications@github.com a scris:

The message about the Hessian indicates that the model did not fit properly. So I assume the fit object has not updated the stock as required by the plot method. Take a look at the residuals and the results of the fit.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/flr/FLa4a/issues/128#issuecomment-465594709, or mute the thread https://github.com/notifications/unsubscribe-auth/APeMa5RO6Rml_wmf-b5sFz1DmRqUrnPpks5vPVmxgaJpZM4a2YPY .

iagomosqueira commented 5 years ago

If the model has not fit properly, there will be no residuals. Look at the convergence row of fitSumm(fit) to see the convergence flag returned by ADMB. Was a log-likelihood value returned (logLik(fit))?

If it does not fit you will need to play with the model. Maybe try to make it simpler, for example using splines rather than factors.

GeoTuxMan commented 5 years ago

fitSumm(fit) iters 1 nopar NA nlogl NA maxgrad NA nobs 294 gcv NA convergence 1 accrate NA

În joi, 21 feb. 2019 la 12:59, Iago Mosqueira notifications@github.com a scris:

If the model has not fit properly, there will be no residuals. Look at the convergence row of fitSumm(fit) to see the convergence flag returned by ADMB. Was a log-likelihood value returned (logLik(fit))?

If it does not fit you will need to play with the model. Maybe try to make it simpler, for example using splines rather than factors.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/flr/FLa4a/issues/128#issuecomment-465956490, or mute the thread https://github.com/notifications/unsubscribe-auth/APeMazEg8OxI_8om2VIqsPJk6GON7Um2ks5vPnwIgaJpZM4a2YPY .

iagomosqueira commented 5 years ago

That's it, the model is not fitting. This is really a modelling issue rather than a software one, but you need to think carefully about the submodels. The defaults tend to work when data series are long and fairly informative, like the ple4 example in the tutorials. For short series, or when the catch and CPUE data are in disagreement, there is work to be done to try to find a suitable model.

GeoTuxMan commented 5 years ago

Yes, you're right....the dataset from catch is from 1980:2014, and cpue, from 1981:2007 So, after I remove some rows from catch dataset:

fitSumm(fit) iters 1 nopar 7.000000e+01 nlogl 8.402570e+01 maxgrad 1.134120e+08 nobs 2.370000e+02 gcv 4.502412e+00 convergence 0.000000e+00 accrate NA

And now, the plot is not empty .

Thank very much for your real help !

All the best ! George

În joi, 21 feb. 2019 la 12:59, Iago Mosqueira notifications@github.com a scris:

That's it, the model is not fitting. This is really a modelling issue rather than a software one, but you need to think carefully about the submodels. The defaults tend to work when data series are long and fairly informative, like the ple4 example in the tutorials. For short series, or when the catch and CPUE data are in disagreement, there is work to be done to try to find a suitable model.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/flr/FLa4a/issues/128#issuecomment-465973830, or mute the thread https://github.com/notifications/unsubscribe-auth/APeMa4utfLzHo8vr78CQdA0Cvjc1oWWjks5vPooagaJpZM4a2YPY .

iagomosqueira commented 5 years ago

OK. The model should be able to deal with the CPUE being shorter, but it is usually at the beginning of the series. Having catch but no index between 2008 and 2014 is a problem.