alachins / raisd

RAiSD: software to detect positive selection based on multiple signatures of a selective sweep and SNP vectors
33 stars 13 forks source link

VCF for testing #39

Open BJWiley233 opened 2 years ago

BJWiley233 commented 2 years ago

Hi,

I am trying to see how long it takes the program to run as well as the memory requirement. You tested this on 1000 genomes. I was just wondering if it may have been for these files here http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/.

Couple questions. Can you run on a single chromosome at a time? If so how long would it take (since there is no OpenMP parallel threading in your program) on single thread for say 1000G chromosome 9 which is around 5-6 GB in size without gzip compression and how much memory would be needed? Is the time on the scale of minutes, hours, days?

Thanks.

alachins commented 2 years ago

Hi!

We used the files available here: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ Yes, you can process a single chromosome per run. You can find some information about RAiSD exec. times here: https://github.com/alachins/raisd/blob/master/publications/RAiSD-X_TRETS2019.pdf (see section 7.3) Each chromosome took some minutes, and the 22 chromosomes (excluding X and Y) together required 3 hours. About memory utilization, RAiSD uses an algorithm that keeps most of the dataset in the disk, and loads parts of it to main memory only when needed. Because of this, the total memory utilization can be as low as a few MB, and as high as several GB, based on a parameter which is currently set to use just a few MB irrespective of the dataset size.

Best regards, Nikos A.

On Sun, Aug 7, 2022 at 8:04 AM BJWiley23 @.***> wrote:

Hi,

I am trying to see how long it takes the program to run as well as the memory requirement. You tested this on 1000 genomes. I was just wondering if it may have been for these files here http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/ .

Couple questions. Can you run on a single chromosome at a time? If so how long would it take (since there is no OpenMP parallel threading in your program) on single thread for say 1000G chromosome 9 which is around 5-6 GB in size without gzip compression and how much memory would be needed? Is the time on the scale of minutes, hours, days?

Thanks.

— Reply to this email directly, view it on GitHub https://github.com/alachins/raisd/issues/39, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALKWCQ5ZAPVHVYOXFJNAVLVX5GXRANCNFSM55Z5FTMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Nikolaos Alachiotis

BJWiley233 commented 2 years ago

Thanks! @alachins this very useful 😄 . Do you mind I keep this "issue" open for a few days so other people at my institution can see it?