Error while computing bNRI for strict bins

DaliangNing / iCAMP1

Infer Community Assembly Mechanisms by Phylogenetic bin-based null model analysis (Version 1)

GNU General Public License v2.0

68 stars 25 forks source link

Error while computing bNRI for strict bins #9

Closed adityabandla closed 3 years ago

adityabandla commented 3 years ago

The function icamp.big fails when restricting it to only strict bins with the error Invalid 'type' (list) of argument in R This seems to because of how the community table is subsetted based on the strict bin IDs.

bin.lev = levels(as.factor(sp.bin[, 1]))
bin.num = length(bin.lev)
com.bin = lapply(1:bin.num, function(i) {
comm[, match(rownames(sp.bin)[which(sp.bin == i)], colnames(comm))]
  })

should instead be (if I am not mistaken)

bin.lev = levels(as.factor(sp.bin[, 1]))
bin.num = length(bin.lev)
com.bin = lapply(1:bin.num, function(i) {
comm[, match(rownames(sp.bin)[which(sp.bin == bin.lev[i])], colnames(comm))]
  })

DaliangNing commented 3 years ago

I strongly recommend not to use strict bins. Many strict bins can have too few or even only a single taxon, which makes further null model analysis highly unreliable.

In current iCAMP package use the merged and re-ordered bin IDs, 'sp.bin' in the icamp.big function is 'taxabin3$sp.bin[,3,drop=FALSE]', which ensures 'sp.bin' always have levels exactly equal to 1:bin.num, thus the 'sp.bin == i' is equivalent to 'sp.bin == bin.lev[i]'.

But, if you want to try 'strict bin' numbering anyway, i.e. set 'sp.bin=taxabin3$sp.bin[,1,drop=FALSE]', you are right, the 'sp.bin == i' must be revised to 'sp.bin == bin.lev[i]'. In addition, 'pdid.bin=lapply(1:bin.num, function(i){match(rownames(sp.bin)[which(sp.bin==i)],pd.spname)})' should also be revised to 'pdid.bin=lapply(1:bin.num, function(i){match(rownames(sp.bin)[which(sp.bin==bin.lev[i])],pd.spname)})' 'colnames(bin.abund)=paste("bin",1:bin.num,sep = "")' should be 'colnames(bin.abund)=paste("bin",bin.lev,sep = "")'.

adityabandla commented 3 years ago

Thanks, Daliang! I just installed iCAMP again to make sure it's the latest version. sp.bin seems to work as you suggest only when omit.option is set to no sp.bin = taxabin3$sp.bin[, 3, drop = FALSE]

when omit.option is set to omit sp.bin = taxabin3$sp.bin[which(!(taxabin3$sp.bin$bin.id.strict %in% rm.bins)), 1, drop = FALSE] and this still seems to retain the original IDs of the strict bins. This leads to the error I mentioned.

With respect to strict bins, I decided to go with pd.cut, strict bins, with a bin size limit = 10. I omitted small bins which gets rid of 16/21 bins, but these cumulatively account for only 3 % of the community. This seems to give more stable results. 3 out 5 have taxa numbers > 24, while 2 of them have >10, but <15.

I found it slightly surprising to still find a few taxa in some of these strict bins which had a pd >> d.cut. I was assuming that when d.cut is used, strict bins should never contain any taxa > d.cut

DaliangNing commented 3 years ago

yes. if omit.option='omit', my previous code is wrong. Your correction is good. A new version of iCAMP (v1.3.5) is uploaded to the folder of the package. Not in CRAN yet.

icamp.big has two options, ds and pd.cut. ds is the phylogenetic distance threshold. but pd.cut is the distance of the truncating line to the tree root. If the largest distance from tip to root is 1.0 and ds=0.2, pd.cut is 0.9 = [1.0-(0.2/2)]. here 'pd.cut' means the cut line of phylogenetic distance to root. i guess you may regard the pd.cut as the ds.