Connect to a PSRdada ringbuffer and write out the data in filterbank format.
This program is part of the data handling pipeline for the AA-ALERT project. See dadatrigger for an introduction and dataflow schema.
Requirements:
Note that psrdada could add an additional dependency on CUDA.
Instructions:
$ mkdir build && cd build
$ cmake .. -DCMAKE_BUILD_TYPE=release
$ make
$ make install
$ dadafilterbank -k <hexadecimal key> -l <logfile> -n <filename prefix for dumps>
Command line arguments:
The program implements different modes:
Not supported modes:
The data rate is set per science case. Supported cases:
Metadata is read from the PSRdada header block. Note that some of the metadata available in the header block is ignored, due to code constraints and optimizations. For values that should be present see the table below.
header key | type | units | description | notes |
---|---|---|---|---|
MIN_FREQUENCY | double | MHz | Center of lowest frequency band | |
BW | double | MHz | Total bandwidth of the observation | |
RA | double | hhmmss.s | Right ascension | |
DEC | double | ddmmss.s | Declination | |
SOURCE | string | text | Source name | |
AZ_START | double | degrees | Azimuth angle of telescope | |
ZA_START | double | degrees | Zenith angle of telescope | |
MJD_START | double | days since epoch | Modified Julian Date | |
PADDED_SIZE | int | bytes | Length of the fastest dimension of the data array | |
SCIENCE_CASE | int | 1 | Mode of operation of ARTS, determines data rate | |
SCIENCE_MODE | int | 1 | Mode of operation of ARTS, determines data layout |
A ringbuffer page is interpreted as an array of Stokes I: [NTABS, NCHANNELS, padded_size] Array padding along the fastest dimension is implemented to facilitate memory copies.
Tied array beams are written to separate files, one per observation. Note that these files can become very big.
Filterbank file names are derived from the file name prefix (-n option).
To prevent issues with relative paths etc., please use fully resolved absolute paths (starting with a '/').
Altough the program is relatively simple, the large arrays can cause performance issues wrt. caching. The matrix transpose and inversion of the channel dimension takes longer than realtime using a naive implementation on the ARTS cluster.
In the tune subdirectory there are several implementations trying out different loop order and various levels of loop unrolling. It also adds openMP, with the number of threads specified in the Makefile. As a final step, you should pin the executable to a specific core using taskset.
To try them run:
cd tune
make all
make time
For science case 4 on the ARTS cluster, the loopct_r6 implementation was fastest (using 2 to 4 threads); this is current implementation.
Jisk Attema, Netherlands eScience Center
Leon Oostrum, UvA
Gijs Molenaar, Pythonic.nl