Map to genomes, count by regions

cfe-lab / MiCall

Pipeline for processing FASTQ data from an Illumina MiSeq to genotype human RNA viruses like HIV and hepatitis C

https://cfe-lab.github.io/MiCall

GNU Affero General Public License v3.0

14 stars 9 forks source link

Map to genomes, count by regions #71

Closed ArtPoon closed 10 years ago

ArtPoon commented 10 years ago

Priority is after 6.3 launch. Motivation is to avoid drops in counts at region boundaries. This will also resolve the HLA problem with frame shifts due to intron. The two reading frames can be treated as separate regions. Mapping reference sequences need to be replaced by full genome sequences. csf2counts.py needs a new reference file that specifies region coordinates inside each genome, using base pair coordinates. Output count files include two columns of coordinates: genome and region.

donkirkby commented 10 years ago

This strategy is on hold until we try issue #76 and extend the mapping regions off either end. We're worried that mapping to the entire genome would be much slower than mapping the individual regions.

ArtPoon commented 10 years ago

Closing, this issue is superseded by #76