dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
137 stars 36 forks source link

BED file not taken into account #279

Open ThatMatin opened 1 year ago

ThatMatin commented 1 year ago

Hi Passing a BED file using --bed results in: [info] Beginning bulk load with no range filter. This also happened with the tutorial's ALDH2 dataset, and the echo -e "chr12\t111760000\t111820000" > ALDH2.bed bed file. What causes this issue?

Thank you.

aardes commented 1 year ago

Hi,

We do have the same issue too, Adding the bed file to filter the regions is not working. command take into account all the chromosomal positions and since we have too many samples it failed due to storage problem error, although we do have enough storage but still we dont understand why it gave not enough storage error.

dennishendriksen commented 1 year ago

Ran into the same issue

lvclark commented 1 year ago

I am also getting [info] Beginning bulk load with no range filter despite providing a BED. And it seems to be trying to do the whole genome because it runs for 24 hours on 16 threads and makes 450 Gb of intermediate files (for ~700 samples) but doesn't finish. I am trying to merge two multisample GVCFs from GATK.

edg1983 commented 1 year ago

This needs to be clarified for me as well.

Apparently, GLNexus always needs to load the bulk of data into the DB, so any BED region is ignored at this stage. The bed then applies in the second phase when it discovers alleles and genotypes from the DB.

Indeed, when the loading phase is finished, I see this in the log.

[2606535] [2023-04-07 11:29:49.124] [GLnexus] [warning] Processing full length of 2580 contigs, as no --bed was provided. Providing a BED file with regions of interest, if applicable, can speed this up.

I don't know about the logic here and if loading only part of the data is possible and would increase speed.

xiekunwhy commented 9 months ago

This needs to be clarified for me as well. Why authors don't answer or deal such problems?

xiekunwhy commented 7 months ago

May be it is wise to split g.vcf files according to bed before running GLnexus