UCBerkeleySETI / rawspec

6 stars 7 forks source link

Floating point support #58

Open radonnachie opened 2 years ago

radonnachie commented 2 years ago

Triggered (for now) by DATATYPE == 'FLOAT', which is not the fallback value ('INTEGER' is).

41

texadactyl commented 2 years ago

rawspec_testing bash command success is contingent on both of these conditions:

Rule of thumb: Don't write to stderrr unless exit status is nonzero.

As an aside, Python and other languages logging provide other useful logging types (warning, debug) with good reason.

Motivation for checking stderr contents in regression testing: I previously found some places in the rawspec source files where there were errors but a zero exit status was being passed back. It is important to catch these cases.

Fixes applied in order to complete regression testing

rawspec_gpu.cu

  1. fprintf on line 837, fprintf(stderr, "Detected %s data.\n", ctx->integer_data ? "Integer" : "Float"); should be fprintf(stdout, "Detected %s data.\n", ctx->integer_data ? "Integer" : "Float");

Since this was an informational-only message, it should go to stdout. Warnings and errors are considered failures in regression testing.

  1. fprintf starting on line 503, fprintf( stderr, // <-----------------------should be stdout "Nbps cannot be %d for integer data, %s 8.\n", ctx->Nbps, NbpsIsExpanded ? "upscaling to" : "treating as" );

  2. Ditto for the next fprintf concerning floating-point data (line 512).

Changed them both to stdout.

rawspectest

Produces erroneous output in rawspec_testing which checks the output to baseline results.
Fix: initialise ctx.integer_data = 1 before calling rawspec_initialize(&ctx) on line 88 of rawspectest.c

====================================================================

Finally, all rawspec_testing succeeds given changes made that I indicated earlier. Note that these are 100% integer cases. We could expand rawspec_testing with the addition of a floating-point .raw file from ATA.

cc: @david-macmahon

radonnachie commented 2 years ago

Thanks Richard! I'll actually switch the integer_data flag to float_data as it will better align with memset(ctx, 0...) being the default. Also, then the only way to trigger floating-point ingest is to have DATATYPE==FLOAT, which further lends itself to maintaining the default integer ingest logic :+1:

texadactyl commented 2 years ago

Please make the stderr --> stdout changes too. Thanks.

texadactyl commented 2 years ago

Okay, success on all SIGPROC and FBH5 cases.

texadactyl commented 2 years ago

@radonnachie Fixing issue #61 (closed) caused this PR to fall slightly behind rawspec main. I suggest that you pull the changes.

Is this still a work in progress?

radonnachie commented 2 years ago

Thanks texadactyl. It's in production use at the ATA, no issues. Only thing left to do is add test cases for floating-point input data. I believe it last passed all the integer tests.

radonnachie commented 2 years ago

Oh sneaky thing is that this incorporates/relies #62. So I'll wait to merge that.

Context: we're writing beams to RAW files in floating point. #41

texadactyl commented 2 years ago

@wfarah @radonnachie Is writing beams to .raw files with elements in floating point format now the standard at ATA?

texadactyl commented 2 years ago

@wfarah @radonnachie Can you make available one .raw file with elements in floating point format? Suggest uploading to the usual place and pass me the URL.

radonnachie commented 2 years ago

It's fairly common place now correct. I believe it is primarily used for calibration.

I can. Thanks texadactyl

texadactyl commented 2 years ago

Using the provided rawspec parameters .....

rawspec ./ATA_guppi_59730_78085_225768981_J0332+5434_off_0001-beam0000 -f 1 -t 32 -d .

working stem: ./ATA_guppi_59730_78085_225768981_J0332+5434_off_0001-beam0000
opening file: ./ATA_guppi_59730_78085_225768981_J0332+5434_off_0001-beam0000.0000.raw
bad obsnchan/nants: 192 % 20 != 0
Segmentation fault

Header:

 INSTANCE=                       0
 DATADIR = '/mnt/buf0'
 BINDHOST= 'enp97s0f1'
 HPCONFIG= 'IBV->FTP->BLADE_B->RAW'
 PROJID  = 'blade   '
 BACKEND = 'GUPPI   '
 TELINFOP= '/opt/mnt/share/telinfo_ata.toml'
 OBSINFOP= '/opt/mnt/share/obsinfo.toml'
 CALWGHTP= '/opt/mnt/share/ant_weights_1+0j.bin'
 DIRECTIO=                       1
 IBVPKTSZ= '42,16,8208'
 CUDADEV =                       0
 BINDPORT=                   10000
 TELESCOP= 'ATA     '
 IBVSTAT = 'running '
 MAXFLOWS=                      16
 NETSTAT = 'receiving'
 DAQSTATE= 'armed   '
 BLOCSIZE=                12582912
 NBITS   =                      16
 NPOL    =                       2
 OBSBW   =                      96
 CHAN_BW =                     0.5
 OBSNCHAN=                     192
 OVERLAP =                       0
 PKTFMT  = 'ATASNAPV'
 TBIN    =  1.9999999999999999e-06
 OBS_MODE= 'RAW     '
 NDROP   =                       0
 PKTSTART=           1849499500000
 PKTSTOP =           1849529500000
 BEAMSTAT= 'beamforming'
 OBSSTAT = 'processing'
 STTVALID=                       1
 OBSDONE =                       0
 OBSNPKTS=                       0
 OBSNDROP=                       0
 OBSBLKPS=                      61
 OBSBLKMS=             0.026134999
 DAQPULSE= 'Tue May 31 21:41:24 2022'
 BLDBLKSZ=                25165824
 BLDBUFST= '0/12    '
 BLDBLKMS=             8.574291229
 NETBUFST= '2/12    '
 OBSINFO = 'VALID   '
 FENCHAN =                    2048
 NANTS   =                      20
 NCHAN   =                     192
 PKTNTIME=                      16
 PKTNCHAN=                      96
 SCHAN   =                    1312
 PIPERBLK=                    8192
 PKTSIZE =                    6160
 IBVBUFST= '1/12    '
 NETTHRDS=                       8
 NPKTS   =                11286012
 PHYSPKPS=       1249997.750000000
 PHYSGBPS=             7.699986935
 RUSHBLKS=                      60
 NETBLKPS=            42.122886658
 NETBLKMS=             4.773543835
 PKTIDX  =           1849499500000
 BLKIDX  =                       0
 BLKSTART=           1849499500000
 BLKSTOP =           1849499508192
 IBVGBPS =               62.019720
 IBVPPS  =             1249994.356
 IBVBLKMS=            12.006278992
 OBSFREQ =                 1591.75
 SYNCTIME=              1650334286
 SOURCE  = 'J0332+5434_off'
 RA      =      53.497306208292514
 DEC     =      54.828762538399999
 RA_STR  =      53.497306208292514
 DEC_STR =      54.828762538399999
 RA_HOURS=      3.5664870805528341
 AZ      =      312.05966186523438
 EL      =      60.700180053710938
 ANTNAMES= '1cB,1eB,1gB,1hB,1kB,2aB,2bB,2cB,2eB,2hB,2jB,2kB,2lB,2mB,4eB,3dB,3lB'
 XPCTGBPS= '7.500GBps 60.000Gbps'
 ANTNMS00= '1cB,1eB,1gB,1hB,1kB,2aB,2bB,2cB,2eB,2hB,2jB,2kB,2lB,2mB,4eB,3dB,3lB'
 ANTNMS01= '4jB,5bB,4gB'
 DUT1    =   -0.096340800000000004
 RA_OFF0 =      53.497306208292514
 DEC_OFF0=      54.828762538399999
 RA_OFF1 =      53.247299999999996
 DEC_OFF1=      54.578769999999999
 SRC_NAME= 'J0332+5434_off'
 STT_IMJD=                   59730
 STT_SMJD=                   78085
 STT_OFFS=                       0
 NPKT    =                   20480
 DROPSTAT= '0/20480 '
 NBEAMS  =                       1
 DATATYPE= 'FLOAT   '
 SMPLTYPE= 'CF16    '
radonnachie commented 2 years ago

Sure, the provided file needs to be processed by rawspec as it is in this branch. Then nbeams kicks in and there will be a printout as per https://github.com/UCBerkeleySETI/rawspec/blob/d702da5ef1273d2f009d6b9360f413813f5a650e/rawspec.c#L584

texadactyl commented 2 years ago

@radonnachie I installed the PR code from the right branch. Also, I thought the right field name is NBEAMS?

texadactyl commented 2 years ago

?????

rawspec -v
rawspec 2.3.1
rawspec: symbol lookup error: rawspec: undefined symbol: rawspec_version_string
radonnachie commented 2 years ago

It is NBEAMS, I've updated that printout. Thanks.

I appreciate that you would know to have installed the PR code, but the NBEAMS override of nants happens before that check: https://github.com/MydonSolutions/rawspec/blob/floating_point/rawspec.c#L583-L596

I have:

$ ./rawspec --version
rawspec 3.1.1+14@g139c340 using librawspec 3.1.1+14@g139c340 and cuFFT 10.5.1.100
HDF5 library version: 1.10.4
The HDF5 library plugin directory (default) is /usr/lib/x86_64-linux-gnu/hdf5/plugins.
WARNING: Plugin bitshuffle is NOT available so compression is DISABLED!
Please copy the bitshuffle plugin to the plugin directory.
texadactyl commented 2 years ago

I installed from here: URL='https://github.com/MydonSolutions/rawspec' BRANCH='floating_point'

After my last git pull + make:

rawspec -v
rawspec 2.3.1+226@g9dc34bf using librawspec 2.3.1+226@g9dc34bf and cuFFT 10.2.1.245
HDF5 library version: 1.12.1
The HDF5 library plugin directory (default) is /usr/local/hdf5/lib/plugin.
The bitshuffle plugin is available.

Better, but the version is off.

texadactyl commented 2 years ago

Last git pull:

git pull
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Total 3 (delta 2), reused 3 (delta 2), pack-reused 0
Unpacking objects: 100% (3/3), done.
From https://github.com/MydonSolutions/rawspec
   2df4f80..9dc34bf  floating_point -> origin/floating_point
Updating 2df4f80..9dc34bf
Fast-forward
 rawspec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
texadactyl commented 2 years ago

Ok, last git pull still has wrong version BUT it ran to conclusion with the supplied .raw file.

rawspec -f 1 -t 32 -d . ATA_guppi_59730_78085_225768981_J0332+5434_off_0001-beam0000

rawspec 2.3.1+226@g139c340 using librawspec 2.3.1+226@g139c340 and cuFFT 10.2.1.245
writing output files in SIGPROC Filterbank format
working stem: ATA_guppi_59730_78085_225768981_J0332+5434_off_0001-beam0000
opening file: ATA_guppi_59730_78085_225768981_J0332+5434_off_0001-beam0000.0000.raw
Header has NBEAMS (1), which overrides NANTS (20)
Detected Float data.
opening file: ATA_guppi_59730_78085_225768981_J0332+5434_off_0001-beam0000.0001.raw [No such file or directory]
output product 0: 174592 spectra
texadactyl commented 2 years ago

rawspec_testing :: no errors detected.

texadactyl commented 2 years ago

Line 584, fprintf(stderr, "Header has NBEAM %d, which overrides NANTS (%d)\n", ..... Informational messages should go to stdout, yes?

radonnachie commented 2 years ago

@david-macmahon In order to try entice your reviewal here is the summary:

That's all. It hasn't affected existing tests on integer data.

radonnachie commented 2 years ago

P.S. This PR adds the ability for rawspec to ingest a RAW file that has floating point data. Such RAW files are the outputs of beamforming at the ATA currently, but floating point support opens the GUPPI specification up to other sources that will enable rawspec to be useful in more up-coming scenarios.

radonnachie commented 2 years ago

All in a +31 −5, backwards compatible change. The nbeams field name change means projects with rawspec as a dependency will need to update that...

david-macmahon commented 2 years ago

The addition of DATATYPE to support floating point inputs seems like a worthwhile addition. It would be good to specify which 16-bit floating point format(s) is(are) supported. This feature should probably be split out into its own PR since the other seemingly independent aspects of this PR have complications/controversy.

I know the original meanings of the GUPPI RAW headers are not well documented, but there is historical precedent for the header names that predate rawspec. I see no technical reason or benefit to change the (pseudo-)standard name NBEAM to NBEAMS. Specifically, NBEAM is used to indicate how many beams the receiver has. For example, GUPPI RAW files from Parkes multi-beam receiver have NBEAM=13. NBEAM also predates the addition of NANTS so if anything, NANTS should be changed to NANT (but I'd be opposed to that too). Even when NBEAM>1 all GUPPI RAW files to date (not counting any that inspired this PR) still only contain data from a single beam, indicated by BEAM_ID.

TBH, storing multiple beams in a single GUPPI RAW data block is not really that great of an idea since the headers only have a single values for RA, DEC, SRC_NAME, etc. I think keeping a single beam per GUPPI RAW file is cleanest in terms of backwards compatibility, but if one really wanted to add support for storing data from multiple beams in a GUPPI RAW file it would probably make more sense from a file format and metadata perspective to store a single beam per GUPPI RAW data block and allow the BEAM_ID field to vary from block to block. Thus, a file with say 2 beams would have two GUPPI RAW blocks per time step (assuming the two beams are recorded commensurately). This would allow each beam to have proper RA, DEC, SRC_NAME, etc, but it would break compatibility with all existing readers that expect time to advance from block to block. With this level of contortion, one might start to consider whether they are just fooling themselves that they are not inventing an entirely new file format and instead use something like HDF5, possibly with external "flat file" datasets.

radonnachie commented 2 years ago

Great points. Thanks for the context which supports understanding, @david-macmahon .

Keenly, you've highlighted a lapse in my explanation for how we're employing it. Currently, NBEAMS only ever gets a value of 1. I think your statement is a far better employment of the NBEAMS key.

I'll adjust the following, and hopefully this chimera is simple enough that I don't have to split it up:

radonnachie commented 2 years ago

Finally I guess that I could fully kill the NBEAM part of it, leaving only the floating point support, by just making my implementation set NANTS=1.

radonnachie commented 2 years ago

Et voila. Again, thanks for your patience.

texadactyl commented 2 years ago

@radonnachie Have you run the regression testing (rawspec_testing)? Just to make sure that no inadvertent side-effects occurred.

texadactyl commented 2 years ago

@radonnachie Regression testing errors found in one file:

09:48:35   reviewer  Compare baseline and trial for ATA_guppi_59811_38723_118314086_AzEl_0001-beam0001.rawspec.0000.tbldsel .....
09:48:35   compare_lists  ERROR: Row 0 baseline value1=32.831085205078125 but trial value=12518.349609375, using rtol=0.0005
09:48:35   compare_lists  ERROR: Row 1 baseline value1=32.36983871459961 but trial value=3827.9306640625, using rtol=0.0005
09:48:35   compare_lists  ERROR: Row 2 baseline value1=33.065658569335945 but trial value=15167.3359375, using rtol=0.0005
09:48:35   compare_lists  ERROR: Row 3 baseline value1=31.764314651489254 but trial value=3967.4091796875, using rtol=0.0005
09:48:35   compare_lists  ERROR: Row 4 baseline value1=32.03502655029297 but trial value=11089.015625, using rtol=0.0005
09:48:35   compare_lists  ERROR: Row 0 baseline value2=33.41324234008789 but trial value=14810.640625, using rtol=0.0005
09:48:35   compare_lists  ERROR: Row 1 baseline value2=32.41711807250977 but trial value=4670.775390625, using rtol=0.0005
09:48:35   compare_lists  ERROR: Row 2 baseline value2=32.349655151367195 but trial value=13485.21875, using rtol=0.0005
09:48:35   compare_lists  ERROR: Row 3 baseline value2=33.15689468383789 but trial value=4977.76318359375, using rtol=0.0005
09:48:35   compare_lists  ERROR: Row 4 baseline value2=31.716466903686523 but trial value=9324.912109375, using rtol=0.0005
09:48:35   compare_lists  ERROR: Row 0 baseline value3=33.456554412841804 but trial value=15664.447265625, using rtol=0.0005
09:48:35   compare_lists  ERROR: Row 1 baseline value3=31.23141098022461 but trial value=4228.11669921875, using rtol=0.0005
09:48:35   compare_lists  ERROR: Row 2 baseline value3=32.650131225585945 but trial value=14398.1416015625, using rtol=0.0005
09:48:35   compare_lists  ERROR: Row 3 baseline value3=32.49973678588867 but trial value=4307.337890625, using rtol=0.0005
09:48:35   compare_lists  ERROR: Row 4 baseline value3=32.62644577026367 but trial value=13721.771484375, using rtol=0.0005

The other files (GBT and ATA) had no issues.

Baseline: files/results from current rawspec on BL github. Trial: files/results from PR rawspec code.

File header:

     INSTANCE=                       1
     DATADIR = '/mnt/buf1'
     BINDHOST= 'enp225s0f1'
     HPCONFIG= 'DEV_IBV-FTP-BLADE_B-RAW'
     PROJID  = 'blade   '
     BACKEND = 'GUPPI   '
     TELINFOP= '/opt/mnt/share/telinfo_ata.toml'
     OBSINFOP= '/opt/mnt/share/obsinfo.toml'
     CALWGHTP= '/opt/mnt/share/ant_weights_1+0j.bin'
     DIRECTIO=                       1
     IBVPKTSZ= '42,16,8192'
     CUDADEV =                       1
     BINDPORT=                   50000
     TELESCOP= 'ATA     '
     IBVSTAT = 'running '
     MAXFLOWS=                      16
     NETSTAT = 'receiving'
     DAQSTATE= 'armed   '
     BLOCSIZE=                12582912
     NANTS   =                       1
     NBITS   =                      16
     NPOL    =                       2
     OBSBW   =                      96
     CHAN_BW =                     0.5
     OBSNCHAN=                     192
     OVERLAP =                       0
     PKTFMT  = 'ATASNAPV'
     TBIN    =  1.9999999999999999e-06
     OBS_MODE= 'RAW     '
     NDROP   =                       0
     PKTSTART=            969229000000
     PKTSTOP =            969241500000
     IBVSNIFF=                   50000
     BEAMSTAT= 'beamforming'
     OBSSTAT = 'processing'
     STTVALID=                       1
     OBSDONE =                       0
     OBSNPKTS=                       0
     OBSNDROP=                       0
     OBSBLKPS=                      61
     OBSBLKMS=             0.027564000
     DAQPULSE= 'Sat Aug 20 10:45:22 2022'
     BLDBLKSZ=                25165824
     BLDBUFST= '0/12    '
     BLDBLKMS=             8.167504311
     NETBUFST= '2/12    '
     OBSINFO = 'VALID   '
     FENCHAN =                    2048
     NCHAN   =                     192
     PKTNTIME=                      16
     PKTNCHAN=                      96
     SCHAN   =                    1504
     PIPERBLK=                    8192
     PKTSIZE =                    6160
     OBSFREQ =                 6287.75
     SYNCTIME=              1659053865
     SOURCE  = 'AzEl    '
     RA      =      188.28501637984286
     DEC     =      67.251958399732615
     RA_STR  =      12.552334425322856
     DEC_STR =      67.251958399732615
     AZ      =  -0.0018876204267144203
     EL      =      18.000028610229492
     ANTNAMES= '1cB,1eB,1gB,1hB,1kB,2aB,2bB,2cB,2eB,2hB,2jB,2kB,2lB,2mB,3cB,3dB,3lB'
     XPCTGBPS= '7.500GBps 60.000Gbps'
     ANTNMS00= '1cB,1eB,1gB,1hB,1kB,2aB,2bB,2cB,2eB,2hB,2jB,2kB,2lB,2mB,3cB,3dB,3lB'
     ANTNMS01= '4jB,5bB,4gB'
     DUT1    =              -0.0262592
     IBVBUFST= '1/12    '
     NETTHRDS=                       8
     IBVGBPS =               62.019534
     IBVPPS  =             1249990.600
     IBVBLKMS=            12.095994949
     NPKTS   =                 5186160
     PHYSPKPS=       1250044.000000000
     PHYSGBPS=             7.700270653
     RUSHBLKS=                      52
     NETBLKPS=            83.649871826
     NETBLKMS=             5.791654110
     PKTIDX  =            969229000000
     BLKIDX  =                       0
     BLKSTART=            969229000000
     BLKSTOP =            969229008192
     SRC_NAME= 'AzEl    '
     STT_IMJD=                   59811
     STT_SMJD=                   38723
     STT_OFFS=                       0
     NPKT    =                   20480
     DROPSTAT= '0/20480 '
     INCOBEAM=                       0
     NBEAMS  =                       2
     DATATYPE= 'FLOAT   '
     SMPLTYPE= 'CF16    '
     NBEAM   =                       2
     BEAM_ID =                       1
     END
radonnachie commented 2 years ago

I'm certain that this is because the baseline is interpreting the sample-data of the ATA_guppi_59811_38723_118314086_AzEl_0001-beam0001 file as as (NBITS=16) 16 bit integers, instead of the half-precision floating point data that it is (DATATYPE=FLOAT). The handling of the data as floating point is the new addition that this PR brings in.

That all of the other files pass indicates that nothing has changed regarding the integer sample-data processing, so I believe you have provided justification for this to undergo final review. I think @david-macmahon mentioned something about stipulating which kinds of floating point formats are supported. Asides from the categorical 16 and 32 bit limitations in the code, I'm not sure I see what else could be stated. Maybe that it only handles the in-built float type, IEEE754.

Thanks for testing @texadactyl, I feel pretty guilty for still not knowing how to 😶.

texadactyl commented 2 years ago

@radonnachie Yes, in the baseline rawspec, there is no datatype flag "float_data" in the raw_hdr definition, hence floating-point support does not yet exist.

I take it that the "trial" floating-point values reported in this issue stream by me are correct for the intended file contents. These will be used to update the rawspec_testing baseline directory once @david-macmahon has approved the merge.

texadactyl commented 2 years ago

@radonnachie

This statement confused me: "In the presence of NBEAM > 0, take there to be only 1 BEAM present and use 1 instead of NANTS. This ultimately means split-ant and ics modes are no-ops in the presence of NBEAM>0".

Are you saying that headers should have NBEAM or NANTS present but not both? If so, the presence of both should cause a stderr diagnostic message that clearly states which is in force. Already does?

If NBEAM is active, then this is the same as a non-antenna case (E.g. GBT). True?

If NBEAM > 0, rawspec ignores all but the first block? If so, a warning should be issued when blocks are ignored or the operations documentation should indicate this.

@david-macmahon

This was also puzzling: "It would be good to specify which 16-bit floating point format(s) is(are) supported.". What formats could there be? Could there be more than one? Sorry if my inexperience is showing.

@david-macmahon

rawspec needs an operations manual and an update to https://github.com/UCBerkeleySETI/breakthrough/blob/master/doc/RAW-File-Format.md

radonnachie commented 2 years ago

I think I had run my mouth a bit. Ive added no capability that uses the NBEAM value. So that hasn't changed. I agree with the state that rawspec was in before this PR: NBEAM isn't used other than to populate the filter bank header.

So it's expected that beam populated files have NANTS set to 1 if at all. The mouth running I did should fall into another PR, if it's worthwhile (I don't think it is), like Dave mentioned.

All that's outstanding is the floating point format curiosity...

texadactyl commented 2 years ago

@radonnachie If only we were in the same office area with access to a shared whiteboard and able to ask questions in real-time. Half (or more) of these electronic messages would be unnecessary. Don't beat yourself up! (-:

radonnachie commented 1 year ago

Change log entry would be something like:

- Added floating-point RAW data support (half and full precision) by accessing "DATATYPE" key and switching on "FLOAT" (fallback value is "INTEGER"). "NBITS" determines precision, 16 and 32 are only acceptable values. A print statement of the datatype is made: "Detected {Integer|Float} Data".