The default value of disk_size in tasks SeparateMultiallelics and SubsetToArrayVCF makes them crash on the Thousand Genomes hg38 joint VCF (260 GB). Setting disk_size=1000 fixes the issue, but at the cost of having waited for a long time before the first crash. Maybe it's possible to define a better default estimate of disk_size?
Task LDPruning has a hardcoded disk size, which makes it crash on the VCF above. Disk size should be exposed in the input with a better default.
The default value of
disk_size
in tasksSeparateMultiallelics
andSubsetToArrayVCF
makes them crash on the Thousand Genomes hg38 joint VCF (260 GB). Settingdisk_size=1000
fixes the issue, but at the cost of having waited for a long time before the first crash. Maybe it's possible to define a better default estimate ofdisk_size
?Task
LDPruning
has a hardcoded disk size, which makes it crash on the VCF above. Disk size should be exposed in the input with a better default.