DaliangNing / iCAMP1

Infer Community Assembly Mechanisms by Phylogenetic bin-based null model analysis (Version 1)
GNU General Public License v2.0
68 stars 25 forks source link

Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent #10

Closed pyspider closed 3 years ago

pyspider commented 3 years ago

I get this error while I try to find optimal values for ds and bin.size.limit.

Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent

traceback() 2: colnames<-(*tmp*, value = *vtmp*) 1: iCAMP::ps.bin(sp.bin = sp.bin, sp.ra = sp.ra, spname.use = spname.use, pd.desc = pd.big$pd.file, pd.spname = pd.big$tip.label, pd.wd = pd.big$pd.wd, nd.list = niche.dif$nd, nd.spname = niche.dif$names, ndbig.wd = niche.dif$nd.wd, cor.method = "pearson", r.cut = 0.1, p.cut = 0.05, min.spn = 5)

Here is the code for this test.

ds = 0.2 # setting can be changed to explore the best choice
bin.size.limit = 5# setting can be changed to explore the best choice. # here set as 5 just for the small example dataset. For real data, usually try 12 to 48.
phylobin=taxa.binphy.big(tree = tree, pd.desc = pd.big$pd.file,pd.spname = pd.big$tip.label,
                         pd.wd = pd.big$pd.wd, ds = ds, bin.size.limit = bin.size.limit,
                         nworker = nworker)

# 8.2 # test within-bin phylogenetic signal.
sp.bin=phylobin$sp.bin[,3,drop=FALSE]
sp.ra=colMeans(comm/rowSums(comm))
abcut=3 # you may remove some species, if they are too rare to perform reliable correlation test.
commc=comm[,colSums(comm)>=abcut,drop=FALSE]
dim(commc)
spname.use=colnames(commc)
binps=iCAMP::ps.bin(sp.bin = sp.bin,sp.ra = sp.ra,spname.use = spname.use,
                    pd.desc = pd.big$pd.file, pd.spname = pd.big$tip.label, pd.wd = pd.big$pd.wd,
                    nd.list = niche.dif$nd,nd.spname = niche.dif$names,ndbig.wd = niche.dif$nd.wd,
                    cor.method = "pearson",r.cut = 0.1, p.cut = 0.05, min.spn = 5)

traceback()

But it ran smoothly if I set bin.size.limit more than 11. About this test, I have a phylogenetic tree with 2718 tips. However, I want to combine their tips with phylogenetic bins (deeper levels) in order to make a reduced phylogeny for analysis of ecological processes. Many thanks for your help in advance. Test_size_11.PhyloSignalDetail.csv Test_size_11.PhyloSignalSummary.csv

DaliangNing commented 3 years ago

@pyspider you may send me (ningdaliang@ou.edu) the input files and the whole R code you used. I need to reproduce the error to debug.

pyspider commented 3 years ago

@pyspider you may send me (ningdaliang@ou.edu) the input files and the whole R code you used. I need to reproduce the error to debug.

Many thanks for your help! I sent you the input files and the whole R code I used. Btw In order to find optimal ds and bin.size.limit, I followed this issue https://github.com/DaliangNing/iCAMP1/issues/3 . I got this error while I tried to test within-bin phylogenetic signal. But it ran smoothly while I set spname.use = NULL. I am not sure whether this error was caused due to my complicated taxa IDs. I have another question for phylogenetic bins here. Could I use an ultrametric tree to calculate phylogenetic bins? I noted this phylogenetic tree is a non-ultrametric tree in the example. I did this conversion because the Mantel correlogram results show significant phylogenetic signals on ultrametric tree instead of non-ultrametric tree.

DaliangNing commented 3 years ago

@pyspider The problem can be solved with the updated iCAMP (version 1.4.4), which can be download from RPackage/AllVersions.

Besides, I suggest using bin.size.limit no less than 12 to ensure enough statistical power in each bin. I set bin.size.limit=5 in the simple example to save running time, but too low bin.size.limit is surely not good for real datasets. If within-bin phylogenetic signal analysis cannot lead to a conclusion, you may try to calculate the stochasticity ratio by NST, then you may choose the bin.size.limit setting leading to similar stochasticity (sum of dispersal and drift) as NST results.

pyspider commented 3 years ago

@pyspider The problem can be solved with the updated iCAMP (version 1.4.4), which can be download from RPackage/AllVersions.

Besides, I suggest using bin.size.limit no less than 12 to ensure enough statistical power in each bin. I set bin.size.limit=5 in the simple example to save running time, but too low bin.size.limit is surely not good for real datasets. If within-bin phylogenetic signal analysis cannot lead to a conclusion, you may try to calculate the stochasticity ratio by NST, then you may choose the bin.size.limit setting leading to similar stochasticity (sum of dispersal and drift) as NST results.

Thanks! It ran smoothly on version 1.4.4. I compared pNST between phylogentic bins with diffferent bin.size.limit (e.g., 6 & 12) and entire OTUs. There are almost similar stochasticity (sum of dispersal and drift) between bin.size.limit setting and entire data. I will try to compare these reduced datasets with different bin.size.limit setting for ecological analyses.

#for entire OTUs
pnst.whole.placement_season=pNST(comm=Y, tree=tree, group=treatment, rand=1000, nworker=4,abundance.weighted=TRUE)
pnst.whole.placement_season$index.grp
# group size ST.i.bMNTD NST.i.bMNTD MST.i.bMNTD
# 1 Summer  136  0.8709204   0.8362814   0.7426230
# 2 Autumn  140  0.8789403   0.9122000   0.6661575
# 3 Winter  110  0.7965825   0.8006963   0.5119162
# 4 Spring  135  0.8810930   0.8929607   0.6826433

#for bin.size.limit=6
pnst.6_placement_season=pNST(comm=comm, tree=backbone, group=treatment, rand=1000, nworker=4,abundance.weighted=TRUE)
pnst.6_placement_season$index.grp
# group size ST.i.bMNTD NST.i.bMNTD MST.i.bMNTD
# 1 Summer  136  0.8797622   0.8588140   0.7546315
# 2 Autumn  140  0.8838600   0.9194965   0.6814246
# 3 Winter  110  0.8060088   0.8019094   0.5327529
# 4 Spring  135  0.8832790   0.9015528   0.6920269

#for bin.size.limit=12
pnst.12_placement_season=pNST(comm=comm, tree=backbone, group=treatment, rand=1000, nworker=4,abundance.weighted=TRUE)
pnst.12_placement_season$index.grp
# group size ST.i.bMNTD NST.i.bMNTD MST.i.bMNTD
# 1 Summer  136  0.8815142   0.8672659   0.7611897
# 2 Autumn  140  0.8814544   0.9232270   0.6827542
# 3 Winter  110  0.7989141   0.8130867   0.5205312
# 4 Spring  135  0.8840476   0.9174806   0.7006627
DaliangNing commented 3 years ago

@pyspider sorry, i did not notice your last message. I am confused. bin.size.limit option is for function icamp.big in iCAMP package, not for the function pNST in NST package. You are showing results from pNST which does not need bin.size.limit. what are the results of icamp.big using different bin.size.limit values.

pyspider commented 3 years ago

@pyspider sorry, i did not notice your last message. I am confused. bin.size.limit option is for function icamp.big in iCAMP package, not for the function pNST in NST package. You are showing results from pNST which does not need bin.size.limit. what are the results of icamp.big using different bin.size.limit values.

Here is the results of icamp.big using different bin.size.limit values. v1.4.4_ds.006_size_limit_6.PhyloSignalSummary.csv Does it mean no significant differences between different bin.size.limit values (bin.size.limit=6 vs. 12)?

DaliangNing commented 3 years ago

Sorry for my late response. RAsig.adj (the relative abundance of bins with significant phylogenetic signal) is the key index, which usually will show a peak or enter a 'saturated' state at the optimum bin.size.limit. The RAsig.adj in your results is still increasing when bin.size.limit changes from 6, 12, to 24. you may try higher value like 48 or even 96.

Alternatively, you may test whether the relative importance of stochastic processes (dispersal limitation + homogenizing dispersal + 'drift') from iCAMP is at a similar level of pNST.

pyspider commented 3 years ago

Sorry for my late response. RAsig.adj (the relative abundance of bins with significant phylogenetic signal) is the key index, which usually will show a peak or enter a 'saturated' state at the optimum bin.size.limit. The RAsig.adj in your results is still increasing when bin.size.limit changes from 6, 12, to 24. you may try higher value like 48 or even 96.

Alternatively, you may test whether the relative importance of stochastic processes (dispersal limitation + homogenizing dispersal + 'drift') from iCAMP is at a similar level of pNST.

Thanks! I tried higher values like 48 and 96. I find that most of the climate variables start to show a peak or enter a 'saturated' state when the bin.size.limit is 24. Here is the results of icamp.big using different bin.size.limit values. v1.4.4_ds.006_size_limit_48_V2.PhyloSignalSummary.csv

DaliangNing commented 3 years ago

Then, in your case, bin.size.limit=48 appears the best choice : )

DaliangNing commented 3 years ago

If no more questions, I will close this issue soon : )

pyspider commented 3 years ago

If no more questions, I will close this issue soon : ) No more questions about this issue. Thanks :)