DaliangNing / iCAMP1

Infer Community Assembly Mechanisms by Phylogenetic bin-based null model analysis (Version 1)
GNU General Public License v2.0
68 stars 25 forks source link

Problem creating filebacked matrix in the use of the pdist.big function. #64

Open DrZhangjilin opened 3 months ago

DrZhangjilin commented 3 months ago

Dear Ning,

I recently encountered the same problem #14 . I noticed that the issue was closed, so I submitted a new issue.

I am sure that my save path does not contain Chinese characters and exists on my computer hard drive (which is normal when running the case data you provided). But using my own ASV data is not feasible. My data contains 80 samples and 165,025 ASV. By the way I am running this process on a private computer with a maximum thread of 20, not sure if this will have any impact. Here's my code:

save.wd="E:/bac_icamp/first0803"
if(!dir.exists(save.wd)){dir.create(save.wd)}
prefix="bac"
nworker=18
memory.G=85
setwd(save.wd)
if(!file.exists("pd.desc")) {
  pd.big=iCAMP::pdist.big(tree = tree, wd=save.wd, nworker = nworker, memory.G = memory.G)
}else{
  pd.big=list()
  pd.big$tip.label=read.csv(paste0(save.wd,"/pd.taxon.name.csv"),row.names = 1,stringsAsFactors = FALSE)[,1]
  pd.big$pd.wd=save.wd
  pd.big$pd.file="pd.desc"
  pd.big$pd.name.file="pd.taxon.name.csv"
}

The error after running is as follows:

Setting parallel cluster for path computing cost 3.038717 secs.  Sat Aug  3 19:06:52 2024
Parallel for 90 tips cost 18.25165 secs. Sat Aug  3 19:07:10 2024
Path computing by parallel may take 9.29300153337585 hours. Sat Aug  3 19:07:10 2024
Now computing path for the rest 164935 tips. begin at Sat Aug  3 19:07:13 2024. Please wait...
Computing path for the rest 164935 tips actually took 3.183824 hours. Sat Aug  3 22:18:12 2024
Now setting big matrix file on the disk. Sat Aug  3 22:18:13 2024
Error in CreateFileBackedBigMatrix(as.character(backingfile), as.character(backingpath),  : 
  Problem creating filebacked matrix.
In addition: Warning message:
'memory.limit()' is no longer supported 

Look forward to your help. Thanks!

DaliangNing commented 1 month ago

@DrZhangjilin most likely, it is due to the large ASV number in your dataset. The phylogenetic distance matrix of 165,025 ASVs will need 165025 x 165025 x 8 bytes = 202.9 GB. Given this size, you had better use a powerful server to run iCAMP for it, and the calculation can still be time consuming. I suggest you had better figure out a way to reduce the ASV number. See page 4 'Reducing the taxa number' in the supplementary information of iCAMP paper (https://www.nature.com/articles/s41467-020-18560-z#Sec17).