Open BJWiley233 opened 2 years ago
Hi!
We used the files available here: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ Yes, you can process a single chromosome per run. You can find some information about RAiSD exec. times here: https://github.com/alachins/raisd/blob/master/publications/RAiSD-X_TRETS2019.pdf (see section 7.3) Each chromosome took some minutes, and the 22 chromosomes (excluding X and Y) together required 3 hours. About memory utilization, RAiSD uses an algorithm that keeps most of the dataset in the disk, and loads parts of it to main memory only when needed. Because of this, the total memory utilization can be as low as a few MB, and as high as several GB, based on a parameter which is currently set to use just a few MB irrespective of the dataset size.
Best regards, Nikos A.
On Sun, Aug 7, 2022 at 8:04 AM BJWiley23 @.***> wrote:
Hi,
I am trying to see how long it takes the program to run as well as the memory requirement. You tested this on 1000 genomes. I was just wondering if it may have been for these files here http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/ .
Couple questions. Can you run on a single chromosome at a time? If so how long would it take (since there is no OpenMP parallel threading in your program) on single thread for say 1000G chromosome 9 which is around 5-6 GB in size without gzip compression and how much memory would be needed? Is the time on the scale of minutes, hours, days?
Thanks.
— Reply to this email directly, view it on GitHub https://github.com/alachins/raisd/issues/39, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALKWCQ5ZAPVHVYOXFJNAVLVX5GXRANCNFSM55Z5FTMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Nikolaos Alachiotis
Thanks! @alachins this very useful 😄 . Do you mind I keep this "issue" open for a few days so other people at my institution can see it?
Hi,
I am trying to see how long it takes the program to run as well as the memory requirement. You tested this on 1000 genomes. I was just wondering if it may have been for these files here http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/.
Couple questions. Can you run on a single chromosome at a time? If so how long would it take (since there is no OpenMP parallel threading in your program) on single thread for say 1000G chromosome 9 which is around 5-6 GB in size without gzip compression and how much memory would be needed? Is the time on the scale of minutes, hours, days?
Thanks.