Closed ArtPoon closed 10 years ago
This strategy is on hold until we try issue #76 and extend the mapping regions off either end. We're worried that mapping to the entire genome would be much slower than mapping the individual regions.
Closing, this issue is superseded by #76
Priority is after 6.3 launch. Motivation is to avoid drops in counts at region boundaries. This will also resolve the HLA problem with frame shifts due to intron. The two reading frames can be treated as separate regions. Mapping reference sequences need to be replaced by full genome sequences. csf2counts.py needs a new reference file that specifies region coordinates inside each genome, using base pair coordinates. Output count files include two columns of coordinates: genome and region.