Closed cgoneill closed 1 year ago
Hi, to confirm that this is an issue with size, can you try running compare_connections with just a small subset of one of the datasets?
I ran code as follows:
> ko.conns.chr3 <- ko.conns[grepl("^chr3-", ko.conns$Peak1) & grepl("^chr3-", ko.conns$Peak2), ] # subsets the dataset from 23192372 connections to 1225206, all on chromosome 3
>
> head(compare_connections(wt.conns, ko.conns.chr3))
[1] FALSE FALSE FALSE FALSE FALSE FALSE
I was also able to run compare_connections()
on subsets of both tables using only connections on chromosome 3, both of which had 1225206 connections.
Hi, sounds like this is an issue with the size. I think for the moment your best bet would be to just run by chromosome. Since cicero does not generate connections between chromosomes you should be able to just run on each separately and concatenate the results. I'll keep this issue open and see if I can write a fix next time I have some engineering time, but that may be awhile. Happy new year!
Thank you! Happy new year!
Hello, and thank you for you work on this excellent software package. I've been trying to use
compare_connections()
to compare two Cicero connection datasets of 23192732 pairs each, but each time I try, my memory usage maxes out after a few minutes and crashes my session. At steady state, my session uses about 9.92 GB of memory, but even when I have 256 GB allocated on an HPC cluster, my memory usage gradually increases until my R session crashes. Here's my session info:I'm assuming
compare_connections()
isn't necessarily meant to compare two Cicero datasets (the vignette example really only mentions comparing a Cicero dataset to non-Cicero datasets), and if that's correct, is there a way to do so?