Open oneillkza opened 4 years ago
So, the first major design decision is to create a new file type, genome_longread
for long read genomic bams. This is distinct from genome
, for short read paired-end genomic bams. I'm probably going to be copying a lot of the code to handle the genome
bam type, but I think that'll be cleaner than having if statements everywhere.
e.g. in stats I've created compute_genome_longread_bam_stats
, which is a modified copy of compute_genome_bam_stats
OK, got it as far as being able to do config and setup. Clustering works, but it fails on validate.
ValueError: ('protocol error', 'genome_longread')
This is somewhat unsurprising. Looks like the next step is to create a class in validate/evidence.py, and a case in validate/main.py to match up the genome_longread protocol to.
Reading in vcfs from variant callers that run on long-read bams is only part of the problem. MAVIS still needs bam files for most operations. Such bams have a few key differences from short-read ("NGS") sequence:
This makes them very good for detecting large structural variants, especially since they can map through low-complexity regions, but less good for smaller variants.
This ticket is to track work on reading in long-read genome bams.