Open arine opened 1 week ago
We can use TSV to convert into VCF with the tool: https://samtools.github.io/bcftools/bcftools.html#convert
# Convert 23andme results into VCF
bcftools convert -c ID,CHROM,POS,AA -s SampleName -f 23andme-ref.fa --tsv2vcf 23andme.txt -o out.vcf.gz
# Convert tab-delimited file into a sites-only VCF (no genotypes), in this example first column to be ignored
bcftools convert -c -,CHROM,POS,REF,ALT -f ref.fa --tsv2vcf calls.txt -o out.bcf
Instead of allowing compromised VCF format into the pipeline, how do you think about the idea of adding conversion step in the pipeline?
If so, Web UI will need to
I reviewed the example and found that the error occurred when all the variants were blacklisted. Therefore, the data itself is valid, but the error happened in the feature engineering part. Seems relevant to #19? Or can we have another example variants that are not filtered by blacklists regions?
Is your feature request related to a problem? Please describe. VCF preprocessing, currently done by bcftools, requires
##FILTER
,##FORMAT
,##INFO
, and##contig
, which is too strict.Describe the solution you'd like Ideally, preprocessing should be done only with CHROM, POS, REF, ALT (and optional FILTER). Here is the example VCF that should be able to pass through the pipeline without error: demo_sloppy.vcf.zip
Describe alternatives you've considered
Additional context