bmvdgeijn / WASP

WASP: allele-specific pipeline for unbiased read mapping and molecular QTL discovery
Apache License 2.0
103 stars 51 forks source link

Running WASP per chromosome #83

Open arielmadr opened 5 years ago

arielmadr commented 5 years ago

Hi,

Thanks so much for developing WASP!. Would you recommend to run WASP per chromosome ( filtering initial bams ) in order to speed up some steps? Would this affect the analysis comparing to running everything in one batch? The reason why I am asking this is that some steps ( such as extract_haplotype_counts) take around 70 hours.

Thanks so much, Ariel

bmvdgeijn commented 5 years ago

Hi Ariel,

I think that should not cause issues. I would not run the mapping steps on each chromosome separately, but the extractions should be fine. I am a bit surprised those steps take that long. Do you have extremely high read depth?

Bryce

On Tue, Mar 12, 2019 at 10:21 PM arielmadr notifications@github.com wrote:

Hi,

Thanks so much for developing WASP!. Would you recommend to run WASP per chromosome ( filtering initial bams ) in order to speed up some steps? Would this affect the analysis comparing to running everything in one batch? The reason why I am asking this is that some steps ( such as extract_haplotype_counts) take around 70 hours.

Thanks so much, Ariel

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bmvdgeijn/WASP/issues/83, or mute the thread https://github.com/notifications/unsubscribe-auth/ADSDgj2fPaUpIalpSBFe5EkTn_vcPUlKks5vWGCpgaJpZM4bsQNA .

arielmadr commented 5 years ago

Hi Bryce, Thanks for your quick answer. Regarding the depth, each sample is around 15 Million reads, but I am running it with 30 samples. Would this time be unexpected? So, would you say that running all the next steps of the CHT pipeline per chromosome would be ok?

thanks a lot, Ariel

bmvdgeijn commented 5 years ago

Hi Ariel,

I am a bit surprised that step is taking so long. It is I/O intensive so it probably depends on what you are running things on. I believe running the steps to create input files per chromosome is fine, but you may want to merge those files in the end before you do the genome-wide correction and overdispersion estimation.

Bryce

On Wed, Mar 13, 2019 at 9:31 PM arielmadr notifications@github.com wrote:

Hi Bryce, Thanks for your quick answer. Regarding the depth, each sample is around 15 Million reads, but I am running it with 30 samples. Would this time be unexpected? So, would you say that running all the next steps of the CHT pipeline per chromosome would be ok?

thanks a lot, Ariel

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bmvdgeijn/WASP/issues/83#issuecomment-472669171, or mute the thread https://github.com/notifications/unsubscribe-auth/ADSDgjCwcmblz9vxaqMXM_vwTyrldsrEks5vWaZggaJpZM4bsQNA .

SaideepGona commented 4 years ago

Hi,

I am also experiencing extremely long runtimes for extract_haplotype_read_counts.py. I'm running things on a compute cluster where IO might be a delaying factor. Parallelizing by chromosome seems like a a rough solution to implement at this point. Is there any alternative solution since there are runtime limitations on the cluster.

bmvdgeijn commented 4 years ago

Hi Saideep,

I'm really surprised that this is an issue. How much memory are you requesting? My best guess is that you aren't requesting enough and that it is doing alot of IO as a result.

On Wed, Nov 20, 2019 at 4:12 PM Saideep Gona notifications@github.com wrote:

Hi,

I am also experiencing extremely long runtimes for extract_haplotype_read_counts.py. I'm running things on a compute cluster where IO might be a delaying factor. Parallelizing by chromosome seems like a a rough solution to implement at this point. Is there any alternative solution since there are runtime limitations on the cluster.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bmvdgeijn/WASP/issues/83?email_source=notifications&email_token=AA2IHAVXZPJRQ4JI5RFJSUTQUWR35A5CNFSM4G5RANAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEVAQ5A#issuecomment-556402804, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2IHAVCSCDQKFUL2CBBZTDQUWR35ANCNFSM4G5RANAA .