SuperDARN / pyDARNio

Python Library for read in SuperDARN data
GNU Lesser General Public License v3.0
8 stars 2 forks source link

borealis_convert increased complexity error triggered erroneously (I think) #27

Closed alexchartier closed 3 years ago

alexchartier commented 3 years ago

BUG

When trying to convert Borealis HDF5 rawacf.hdf5.site files, I get the following error:

The file cannot be converted to SDARN DMap rawacf due to the following error: Increased complexity: Borealis rawacf file record 1633013160047 blanked_samples [ 0 72 96 160 176 208 216] does not equate to pulses array converted to sample number [ 0 9 12 20 22 26 27] * 8

On inspection of those arrays, they should be the same (i.e. each number in the first list is 8x the number in the second list). Therefore, the error message is incorrect.

On further inspection, the cause of the error message is around line 424 of borealis_convert.py: ... sample_spacing = int(record['tau_spacing'] / record['tx_pulse_len']) normal_blanked_1 = record['pulses'] * sample_spacing normal_blanked_2 = normal_blanked_1 + 1 blanked = np.concatenate((normal_blanked_1, normal_blanked_2)) blanked = np.sort(blanked)

            if not np.array_equal(record['blanked_samples'], blanked):

...

I don't understand why "normal_blanked_2" exists, or why it is concatenated into "blanked" If we just did blanked = normal_blanked_1 instead, then it seems everything would work fine.

Priority

Example of the bug

from pydarnio import BorealisConvert filename= '20210930.1446.00.wal.0.rawacf.hdf5.site' borealis_filetype = 'rawacf' dmap_filename = '/borealis_nfs/borealis_data/daily/20210930.1446.00.wal.a.rawacf.site' slice_id = 0 borealis_converter = BorealisConvert(filename, borealis_filetype, dmap_filename, slice_id, borealis_file_structure='site')

pydarnio.exceptions.borealis_exceptions.BorealisConvert2RawacfError: The file cannot be converted to SDARN DMap rawacf due to the following error: Increased complexity: Borealis rawacf file record 1633013160047 blanked_samples [ 0 72 96 160 176 208 216] does not equate to pulses array converted to sample number [ 0 9 12 20 22 26 27] * 8

Attempts

What have you tried already to fix it?

Tried commenting out the normal_blanked_2 stuff, got a different error: pydarnio.exceptions.borealis_exceptions.BorealisConvert2RawacfError: The file cannot be converted to SDARN DMap rawacf due to the following error: 'agc_status_word'

What have you tried to get around it?

Data Location

to be uploaded here

Please provide other pertinent details about this feature: Ubuntu linux

alexchartier commented 3 years ago

git won't let me upload the file

alexchartier commented 3 years ago

I have uploaded a file that won't convert here: https://drive.google.com/file/d/1vPKuJlbmK_E8it-rG67K3o1EMLtyohB5/view?usp=sharing

mts299 commented 3 years ago

Actually, this issue came up with RKN Borealis update. Sorry for the delay our Engineers were away with getting Borealis on working on Rankin and Clyde. @RemingtonRohel @Lozzy-Bear Can you comment on knowledge on this issue?

I think we determined this might actually be a Borealis bug than a pyDARNio.

alexchartier commented 3 years ago

Thanks. What's the best way to set up automated testing of these conversion routines? Should we upload a set of "approved" files to convert on installation? If so, would they go in github or somewhere like Zenodo and get downloaded?

RemingtonRohel commented 3 years ago

Marina and I have talked about this issue when it came up during our installation at RKN. I confused myself a little bit at that time, but I have two thoughts now:

  1. The test is checking that adjacent samples are blanked, both the sample containing the pulse and the immediately following sample. I believe this is the desired behaviour for the system and the borealis code has been changed to do that in the develop branch. @kevinkrieger is on holidays until next week, but he should be able to confirm.
  2. The error code doesn't reflect what the test is actually doing. If we are checking that the blanked samples actually equates to pulses * sample number, as the error code says, then it should pass for your file.

I think that we probably need to change pyDARNio to fix this. Since all of our borealis sites are generating files which don't pass the test, but we're converting them anyways with no problems by using pyDARN v1.1.0, the test doesn't seem like it's useful in its current state. If I am remembering correctly and borealis v0.5 only blanks pulses sample_spacing, and v0.6 blanks [pulses, pulses+1] sample_spacing, maybe the test should behave differently based on the borealis version used to generate the files?

alexchartier commented 3 years ago

OK, I think we need two things going forward:

1 Conversion test routines in pydarnio so we can tell if the problem is bad files or bad pydarnio code

2 Post a bug report when you find a problem

I also thought of going back to an older pydarn, but it seems the conversion code itself was always in pydarnio (it just stopped being imported by pydarn). So I'm not sure that would really fix it. But maybe I didn't follow the commit history closely enough

mts299 commented 3 years ago

@alexchartier this is out of scope for this issue and actually something that has been discussed for other software. Unfortunately, when I was trying to make a data repository on FRDR or Zenodo (as SuperDARN data files are too large for GitHub) no one aided me in finding specific files to add to the test data set. I would strongly encourage you to discuss this with other PI's and maybe make a task force for it. Otherwise, we may be at a standstill as I alone cannot and probably shouldn't it alone. Please email if you have more questions. Also you or one of your colleagues are more than welcome to pick up some pyDARNio work.

See the PR's Angeline made for pyDARNio test suite.

alexchartier commented 3 years ago

@mts299 The trouble is I don't know if my Borealis files are good or not. Maybe @RemingtonRohel has some? I can certainly make a test code (I would have if I had a file I could trust) but the "official" files have to come from the Borealis developers. I don't think it's out of scope though - I can't tell whether this is a pydarnio bug or a borealis bug without a "known good" HDF5 rawACF file

RemingtonRohel commented 3 years ago

@alexchartier I have found a couple borealis files of different versions that I have been testing with, I will send attach them here for you to download.

From my testing, it looks like this is a pyDARNio bug, not borealis. I've made a branch with changes (https://github.com/SuperDARN/pyDARNio/tree/rawacf_dmap_conversion) that should fix the problem for borealis v0.5 files.

I get the same error as you did about the agc_status_word, which I think is a separate issue. It looks to me like pyDARNio was updated for Borealis v0.6 which has agc_status_word and lp_status_word signals, but the lack of those signals in v0.5 files was somehow missed. From my testing it looks like that's the only other problem, and fixing that should allow your files to convert without issue.

RemingtonRohel commented 3 years ago

Here are two files, one v0.5 and one v0.6, which I used for testing along with yours:

https://drive.google.com/drive/folders/1j37Ez6JY_IlNAYXx2kG6IGR3aWzhMNMi?usp=sharing

alexchartier commented 3 years ago

@RemingtonRohel

Following your suggestion, I installed your branch and was able to convert one of your files. However I don't have a way of testing the conversion. My rawacf file was very small - I don't think I achieved a full conversion. Can you upload the file you expect to obtain after conversion, and also upload the code you used to generate it? That will allow me to test whether I have a working system here.

Alex

P.S. here's what I did:

from pydarnio import BorealisConvert filename= '20210930.1446.00.wal.0.rawacf.hdf5.site' borealis_filetype = 'rawacf' dmap_filename = '/borealis_nfs/borealis_data/daily/20210930.1446.00.wal.a.rawacf.site' slice_id = 0 borealis_converter = BorealisConvert(filename, borealis_filetype, dmap_filename, slice_id, borealis_file_structure='site')

alexchartier commented 3 years ago

P.P.S. I'm not able to test this within Borealis until the master branch of pydarnio gets the fix. This is because I can't see how to install the fixed pydarnio such that the realtime environment sees that version of it

RemingtonRohel commented 3 years ago

@alexchartier

I've added two more files from 20211006 to the google drive link above, that I just tested. One is an hdf5 site file, and the other its dmap equivalent. I started with the hdf5 file, then converted it using pyDARN v1.1.0 since that's what we're using at site for all of our conversions, so I know if it converts with that then it has to be fine. It's good enough for what we upload, at any rate. I then converted it with pyDARNio on the branch I created, and then compared the two files, which were basically identical. The only difference was the 'combf' field in each record, because I gave two different paths to the hdf5 file when I converted two different ways.

With regards to "full conversion", BorealisConvert converts record-by-record, so I think that your conversion was probably just fine, even with a small data file. If it converts one record correctly, it should do the same for all records (ideally).

As for code that I used, I essentially used the exact same code as you just posted to generate my files. It was functionally identical. To generate the files with pyDARN v1.1.0, it was exactly the same thing but pydarn.BorealisConvert instead of pydarnio.BorealisConvert.

To compare DMap files, I used the following code:

import numpy as np
import pydarnio

files = ['/home/remington/sample_data/dmap_generated/pydarnio.dmap',
         '/home/remington/sample_data/dmap_generated/20211006.2132.00.sas.a.rawacf.dmap']

SDarn_read_1 = pydarnio.SDarnRead(files[0])
SDarn_read_2 = pydarnio.SDarnRead(files[1])
rawacf_data_1 = SDarn_read_1.read_rawacf()
rawacf_data_2 = SDarn_read_2.read_rawacf()

for i in range(len(rawacf_data_1)):
    record_1 = rawacf_data_1[i]
    record_2 = rawacf_data_2[i]

    for key in record_1.keys():
        val_1 = record_1[key]
        val_2 = record_2[key]

        if not np.array_equal(val_1, val_2):
            # if key == 'combf':
            #     continue
            print("Different data:"
                  "\ti = {}\n"
                  "\t\tkey = {}\n"
                  "\t\t\tval_1 = {}\n"
                  "\t\t\tval_2 = {}"
                  "".format(i, key, val_1, val_2))

P.S. You can install a package from a branch with pip using pip install git+git://github.com/SuperDARN/pydarnio.git@rawacf_dmap_conversion. Do this within borealisrt_env and your realtime conversion should work.

alexchartier commented 3 years ago

@RemingtonRohel Thanks I think it is working now so I will close the issue. I recommend your fix be merged into the main pydarnio, along with a test routine based on those example files you uploaded and those code snippets. I can help implement that testing code, but I'm not sure where the examples files should be kept. Note: I was not able to install the package from a branch as you suggested. I had to change the following line in borealisrt_env/pyvenv.cfg : include-system-site-packages = true