unpack() function deals 1-bit data differently with ewanbarr/sigpyproc

FRBs / sigpyproc3

Python3 version of Ewan Barr's sigpyproc library

https://sigpyproc3.readthedocs.io

MIT License

14 stars 11 forks source link

unpack() function deals 1-bit data differently with ewanbarr/sigpyproc #26

Closed xxgt closed 4 months ago

xxgt commented 1 year ago

Hi, I found that this repository unpacks 1-bit data in big endian order, however ewanbarr/sigpyproc reacts differently in this case: https://github.com/ewanbarr/sigpyproc/blob/54a804200723d30601026be5bfa37ec90c8266c1/c_src/libSigPyProc.c#L23-L27 I wonder if this is meant to be?

ewanbarr commented 6 months ago

It is unclear if this is a bug or a convention. The implementation in the original sigpyproc was a direct port of the implementation in sigproc (see https://github.com/SixByNine/sigproc/blob/master/src/readchunk.c).

There the 1-bit unpacker explicitly unpacks in little-endian order.

    /* unpack the sample(s) if necessary */
    switch (nbits) {
    case 1:
      fread(&c,1,1,input);
      for (i=0;i<8;i++) {
    f[i]=c&1;
    c>>=1;
      }
      ns=8;
      break;

The sigproc documentation states:

Multi-byte precision data are written in different orders depending on your machine’s operating system. The original WAPP data, for example, was written on a PC (little endian format). The filterbank program knows about this and automatically does any byte swapping required while reading. When it comes to writing the data out, however, the program will always write data in the native order of the processing machine. To swap the bytes around before writing for use on other machines, use the-swapout option.

The endianness of the data depends on the recording instrument and it should probably be more explicitly set when reading the files.

pravirkr commented 6 months ago

This is a convention so that we have the same bitorder for low-bit data. Looks like this need not be true.

So, the sigproc codebase follow different bitorder convention for 1-bit data? There is no header keyword defined to infer the bitorder from the filterbank file. The -swapout option only controls the writing convention.

I think using endianness as a parameter flag (bitorder) for these read/write functions and having both kind of unpacking implementation can be a good strategy. The flag would then be irrelevant if nbits > 8.

FilReader.read_block(), FilReader.read_plan(), Header.prep_outfile()

Ideally, it should have been a header flag.

Need to reverse/make changes in https://github.com/telegraphic/numbits/issues/6 and https://github.com/telegraphic/numbits/pull/7

ewanbarr commented 6 months ago

bitorder is more correct than endianness here so it makes sense to use that terminology.

As a note, the previous discussion about unpack1_8 vs np.unpackbits is no longer valid after PR #34. The previous performance discrepancy was due to the intermediate buffer allocation.

In [72]: %timeit kernels.unpack1_8(in_ar, out_ar)
563 µs ± 2.26 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [73]: %timeit np.unpackbits(in_ar)
646 µs ± 9.54 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)