google-code-export / segtools

Automatically exported from code.google.com/p/segtools
1 stars 1 forks source link

Signal-distribution assumes signal values are integers #9

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
The signal distribution analyses assume that binning the signal on integers is 
reasonable. This is not the case for signal data that varies less than 1.0, 
such as GC content. At least one offending section is in signal_distribution.py:

        max_bins = ceil(genome.maxs).astype(int)
        min_bins = floor(genome.mins).astype(int)
        # A dict from tracks to a range tuple
        track_ranges = dict(zip(tracks, zip(min_bins, max_bins)))
    ...
    bins=xrange(min_bin, max_bin + 1)

Original issue reported on code.google.com by orion.bu...@gmail.com on 31 Aug 2010 at 12:40

GoogleCodeExporter commented 9 years ago
There needs to be some sort of binning. This is already one of the most 
inefficient parts of segtools. Additionally, it is nice for the bin edges to be 
numbers that can be represented exactly and conveniently in both decimal and 
binary floating point. What should we do for other cases? One idea is to set 
max_bins and min_bins as here, so they are always an integer, but set the bin 
edges so that there are always at least 1000 bins.

Original comment by hoffman...@gmail.com on 31 Aug 2010 at 9:58