UCBerkeleySETI / blimpy

Breakthrough Listen I/O Methods for Python
https://blimpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
46 stars 40 forks source link

Read header without loading entire file? #4

Closed umranhaji closed 8 years ago

umranhaji commented 8 years ago

I'm noticing that using Filterbank to extract header parameters takes significantly longer on larger .fil files than on smaller ones. I'm using the Filterbank class to read header information from spliced 0002-resolution .fil files (all 8 files spliced together) and I've noticed that, for example, extracting fch1 takes between 0.5 and 0.6 seconds per file. In contrast, extracting fch1 from a smaller file, i.e. a non-spliced file of the same frequency resolution, takes about 0.1 seconds.

Am I correct in suspecting that if one desires to extract header information without interacting with the data, the speed should be independent of the size of the data part of the file? If so, the difference in speed suggests the entire file is being read before the header information is returned. Is there any way of getting around this?

telegraphic commented 8 years ago

Hi @umranhaji, the latest version should address this. The Filterbank class creates an array of frequency values, which takes more time if there are more channels. Isabel and I changed this behavior recently for extracting the 21cm data.

The quickest way would be to just use the read_header function and skip generation of the Filterbank object: https://github.com/UCBerkeleySETI/filterbank/blob/master/filterbank.py#L290

umranhaji commented 8 years ago

Thank you @telegraphic!