Open ThatMatin opened 1 year ago
Hi,
We do have the same issue too, Adding the bed file to filter the regions is not working. command take into account all the chromosomal positions and since we have too many samples it failed due to storage problem error, although we do have enough storage but still we dont understand why it gave not enough storage error.
Ran into the same issue
I am also getting [info] Beginning bulk load with no range filter
despite providing a BED. And it seems to be trying to do the whole genome because it runs for 24 hours on 16 threads and makes 450 Gb of intermediate files (for ~700 samples) but doesn't finish. I am trying to merge two multisample GVCFs from GATK.
This needs to be clarified for me as well.
Apparently, GLNexus always needs to load the bulk of data into the DB, so any BED region is ignored at this stage. The bed then applies in the second phase when it discovers alleles and genotypes from the DB.
Indeed, when the loading phase is finished, I see this in the log.
[2606535] [2023-04-07 11:29:49.124] [GLnexus] [warning] Processing full length of 2580 contigs, as no --bed was provided. Providing a BED file with regions of interest, if applicable, can speed this up.
I don't know about the logic here and if loading only part of the data is possible and would increase speed.
This needs to be clarified for me as well. Why authors don't answer or deal such problems?
May be it is wise to split g.vcf files according to bed before running GLnexus
Hi Passing a BED file using
--bed
results in:[info] Beginning bulk load with no range filter
. This also happened with the tutorial's ALDH2 dataset, and theecho -e "chr12\t111760000\t111820000" > ALDH2.bed
bed file. What causes this issue?Thank you.