Open zachary-foster opened 2 weeks ago
Hey, thanks for the suggestion. As a workaround you could pass a filelist to bcftools concat to concatenate the VCFs
# Make a vcf_filelist.txt
bcftools concat --naive -Oz -oall.vcf.gz --file-list vcf_filelist.txt
Best, Hannes
This is because
graphtyper genotype
outputs many small VCFs. For long references with lots of samples, this creates so many files that the command line tographtyper vcf_concatenate
is too long for the shell to run:For the dataset that caused this error, the command to
graphtyper vcf_concatenate
was 3 million characters long. I know this is probably an unusual dataset and there are workarounds, like combining files in batches and combining them again, but we are using graphtyper in a automated pipeline that has to handle these cases, so it would be nice if this case was handled.It would be nice if there was a way to make
graphtyper genotype
make fewer but larger files or makegraphtyper vcf_concatenate
accept a file-of-filenames likegraphtyper genotype
does.