USCCANA / netdiffuseR

netdiffuseR: Analysis of Diffusion and Contagion Processes on Networks
https://USCCANA.github.io/netdiffuseR
Other
85 stars 21 forks source link

Issue with memory/size of network #29

Closed OwenTheAnalyst closed 3 years ago

OwenTheAnalyst commented 3 years ago

Hello,

Apologies - this is less a bug and more a question/request for advice!

I am (trying to) work with a network of around 400,000 vertices over seven periods, and around 5% total adoption. I am working on a machine with 32GB memory and a relatively decent CPU. I can work with iGraph relatively quickly but struggling to get any outputs using netdiffuseR on this dataset, including trying to produce the network summary statistics. Is this to be expected (e.g. need to reduce my network size) or is something amiss here?

sessionInfo() R version 4.0.3 (2020-10-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows Server 2012 R2 x64 (build 9600)

Matrix products: default

Thanks in advance!

OwenTheAnalyst commented 3 years ago

Update - I have halved the dataset, and now getting an error within first five seconds of calling summary():

error: arma::memory::acquire(): out of memory Error in moran_cpp(x, w) : std::bad_alloc

So believe is just too large a dataset! Will seek to condense further...

gvegayon commented 3 years ago

What is exactly what you need? The issue could be that summary.diffnet includes calculating Moran's I using the geodesic distance matrix. In a network with 400K vertices, the geodesic distance matrix would use about 8 bytes * 400,000^2 ~ 1 Tb, which is why you get the std::bad_alloc error. You can skip Moran's I by setting skip.moran = TRUE. Now, if you really need Moran's I, then you'll need to partition your network to make it work. Does it make sense to have all 400K vertices in a single graph?

OwenTheAnalyst commented 3 years ago

Hi @gvegayon - thanks so much for the response. No - it probably does not! Still getting used to netdiffuseR (but super impressed) - is there any capability within the package to partition a network or do I need to go back to iGraph to do that and then bring some subcommunities back in?