FRBs / sigpyproc3

Python3 version of Ewan Barr's sigpyproc library
https://sigpyproc3.readthedocs.io
MIT License
14 stars 11 forks source link

Implement FilReader.readDedispersedBlock() #1

Closed David-McKenna closed 4 years ago

David-McKenna commented 4 years ago

This is a function that implements functionality similar to a chained FilReader.readBlock().dedisperse()[:, :valid_samples] call, saving on memory in all non-DM=0 cases and I/O in some cases.

It offers two reading modes, controlled by the "small_reads" kwarg. In the True case (and dtype is longer than 8 bits), this will only perform disk I/O to get the specific frequency channels needed from a specific time sample. In the False case, this reads an entire sample and extracts the frequency channels of interest.

Possible caveats: I suspect that this will not work in the nbit < 8 case if the product of nbits and nchans is not evenly divisible in byte space.

Testing this code on the I-LOFAR REALTA nodes (HDD based), I see a 60% reduction in execution time between readDedispersedBlock and the previous chained method for long reads with a relatively high DM:

In [22]: %timeit data1 = reader.readDedispersedBlock(195349812, abs(195350012 - 195353724) + 200, 56.751)
23.1 s +- 706 ms per loop (mean +- std. dev. of 7 runs, 1 loop each)

In [23]: %timeit data2 = reader.readDedispersedBlock(195349812, abs(195350012 - 195353724) + 200, 56.751, smallReads = False)
23.5 s +- 101 ms per loop (mean +- std. dev. of 7 runs, 1 loop each)

In [24]: %timeit rawdata1 = reader.readBlock(195349812, delays[-1] + abs(195350012 - 195353724) + 200, 56.751); data3 = rawdata1.dedisperse
    ...: (56.751)[:, :abs(195350012 - 195353724) + 200]
1min 6s +- 14.5 s per loop (mean +- std. dev. of 7 runs, 1 loop each)

With the data produced being the exact same:

In [30]: np.sum(np.logical_not(np.logical_and(data3 == data2, data3 == data1)))
Out[30]: FilterbankBlock(0)

I'll say that ths is actually slower than another methodology I'm preparing for a serparate PR at the moment (readBlock + dedisperse -> only_valid_samples), though this function could still be useful for those working in memory-constrained enviroments.

By the way, cheers for the work in porting sigpyproc to python 3.