Open lynnjo opened 5 months ago
Hi @lynnjo,
Please try adding the following --tiledb-config
options to your export command, which will increase sm.memory_budget
to 10GiB, sm.memory_budget_var
to 20GiB, and skip the memory budget check.
tiledbvcf export \
--uri tiledb_datasets/gvcf_dataset \
-m -b 65536 \
-o /workdir/lcj34/phg_v2/exportedHvcfs/mergedGvcf.vcf \
--tiledb-config sm.memory_budget=10737418240,sm.memory_budget_var=21474836480,sm.skip_unary_partitioning_budget_check=true
The export may be slow, as reported by the original error message, because we have not optimized the performance of exporting a merged VCF yet.
Thanks @gspowley - I will try the above.
Do I still keep the "-b 65536" flag while adding the last line you show?
One more question: We note that GATK can export a multi-sample vcf using the "gatk -GenomeGVCFs -V genodb://" and that is relatively fast. I know tiledbvcf originated as genomicsDB. Is the reason this works from GATK due to GATK doing some of the work to merge the files?
Yes, keeping the -b 65535
option will improve the export performance, assuming your system has enough memory. The memory budget parameters may need some tuning based on your dataset and system resources.
Hello -
I am using tiledbvcf to create a dataset that I would later like to be able to export as a merged vcf file. I can successfully, load and export data from this dataset. What I would like to do is export to a multi-sample vcf file. It looks like export with the -m option should handle this, though it gives me memory errors. I added the -b flag to increase this but still no luck. The command I am running:
The error I get:
Is there another trick to running the tiledbvcf export command to create a merged vcf? Thank you
I am running tiledbvcf version:
My machine is a linux, these specifics: