HenrikBengtsson / aroma.seq

🔬 R package: aroma.seq: High-Throughput Sequence Analysis using the Aroma Framework
https://github.com/HenrikBengtsson/aroma.seq
0 stars 1 forks source link

GcBaseFile: Add getBinWidth() and report it with print() #35

Open HenrikBengtsson opened 8 years ago

HenrikBengtsson commented 8 years ago

print() for GcBaseFile should also report on the bin width ("span"), e.g. "50000 bp".

Currently, we have:

> gc
GcBaseFile:
Name: GRCh37
Tags: hg19
Full name: GRCh37,hg19
Pathname: annotationData/organisms/Homo_sapiens/GRCh37,hg19/UCSC/hg19.gc50Base.txt.gz
File size: 174.53 MiB (183007077 bytes)
RAM: 0.00 MB
Number of sequence contigs: 93
Sequence names: [93] chr1, chr10, chr11, ..., chrY
Ordering of sequence contigs (scores): 100% lexicographic, 100% canonical, 94.6% mixeddecimal, 92.5% mixedroman

but there is a bit more information to pull out in addition to sequence names, e.g.

> readLines(getPathname(gc), n=10)
 [1] "variableStep chrom=chr1 span=50" "10001\t48.0"
 [3] "10051\t50.0"                     "10101\t54.0"
 [5] "10151\t46.0"                     "10201\t50.0"
 [7] "10251\t54.0"                     "10301\t60.0"
 [9] "10351\t54.0"                     "10401\t50.0"

The getBinWidth() method can be used to validation/assertion etc.