NVIDIA-Genomics-Research / AtacWorks

Deep learning based processing of Atac-seq data
https://clara-parabricks.github.io/AtacWorks/
Other
128 stars 23 forks source link

Fix to include ends of chromosomes. #114

Open avantikalal opened 4 years ago

avantikalal commented 4 years ago

Currently get_intervals.py deletes intervals that extend beyond the chromosome length. This means that the last few kilobases of each chromosome are not covered in the intervals and no prediction is made for them when we perform inference. These bases should somehow be included. Pad the h5 with zeros beyond the chromosome size, then at the time of writing the bedGraph file trim off the excess zeros.

ntadimeti commented 4 years ago

@avantikalal where is the chromosome length defined ? and how do we know the full length of the chromosome to know how many zeroes to append ?

ntadimeti commented 4 years ago

De-prioritizing this for v0.3.0