Closed alexchartier closed 3 years ago
git won't let me upload the file
I have uploaded a file that won't convert here: https://drive.google.com/file/d/1vPKuJlbmK_E8it-rG67K3o1EMLtyohB5/view?usp=sharing
Actually, this issue came up with RKN Borealis update. Sorry for the delay our Engineers were away with getting Borealis on working on Rankin and Clyde. @RemingtonRohel @Lozzy-Bear Can you comment on knowledge on this issue?
I think we determined this might actually be a Borealis bug than a pyDARNio.
Thanks. What's the best way to set up automated testing of these conversion routines? Should we upload a set of "approved" files to convert on installation? If so, would they go in github or somewhere like Zenodo and get downloaded?
Marina and I have talked about this issue when it came up during our installation at RKN. I confused myself a little bit at that time, but I have two thoughts now:
I think that we probably need to change pyDARNio to fix this. Since all of our borealis sites are generating files which don't pass the test, but we're converting them anyways with no problems by using pyDARN v1.1.0, the test doesn't seem like it's useful in its current state. If I am remembering correctly and borealis v0.5 only blanks pulses sample_spacing, and v0.6 blanks [pulses, pulses+1] sample_spacing, maybe the test should behave differently based on the borealis version used to generate the files?
OK, I think we need two things going forward:
I also thought of going back to an older pydarn, but it seems the conversion code itself was always in pydarnio (it just stopped being imported by pydarn). So I'm not sure that would really fix it. But maybe I didn't follow the commit history closely enough
@alexchartier this is out of scope for this issue and actually something that has been discussed for other software. Unfortunately, when I was trying to make a data repository on FRDR or Zenodo (as SuperDARN data files are too large for GitHub) no one aided me in finding specific files to add to the test data set. I would strongly encourage you to discuss this with other PI's and maybe make a task force for it. Otherwise, we may be at a standstill as I alone cannot and probably shouldn't it alone. Please email if you have more questions. Also you or one of your colleagues are more than welcome to pick up some pyDARNio work.
See the PR's Angeline made for pyDARNio test suite.
@mts299 The trouble is I don't know if my Borealis files are good or not. Maybe @RemingtonRohel has some? I can certainly make a test code (I would have if I had a file I could trust) but the "official" files have to come from the Borealis developers. I don't think it's out of scope though - I can't tell whether this is a pydarnio bug or a borealis bug without a "known good" HDF5 rawACF file
@alexchartier I have found a couple borealis files of different versions that I have been testing with, I will send attach them here for you to download.
From my testing, it looks like this is a pyDARNio bug, not borealis. I've made a branch with changes (https://github.com/SuperDARN/pyDARNio/tree/rawacf_dmap_conversion) that should fix the problem for borealis v0.5 files.
I get the same error as you did about the agc_status_word, which I think is a separate issue. It looks to me like pyDARNio was updated for Borealis v0.6 which has agc_status_word and lp_status_word signals, but the lack of those signals in v0.5 files was somehow missed. From my testing it looks like that's the only other problem, and fixing that should allow your files to convert without issue.
Here are two files, one v0.5 and one v0.6, which I used for testing along with yours:
https://drive.google.com/drive/folders/1j37Ez6JY_IlNAYXx2kG6IGR3aWzhMNMi?usp=sharing
@RemingtonRohel
Following your suggestion, I installed your branch and was able to convert one of your files. However I don't have a way of testing the conversion. My rawacf file was very small - I don't think I achieved a full conversion. Can you upload the file you expect to obtain after conversion, and also upload the code you used to generate it? That will allow me to test whether I have a working system here.
Alex
P.S. here's what I did:
from pydarnio import BorealisConvert filename= '20210930.1446.00.wal.0.rawacf.hdf5.site' borealis_filetype = 'rawacf' dmap_filename = '/borealis_nfs/borealis_data/daily/20210930.1446.00.wal.a.rawacf.site' slice_id = 0 borealis_converter = BorealisConvert(filename, borealis_filetype, dmap_filename, slice_id, borealis_file_structure='site')
P.P.S. I'm not able to test this within Borealis until the master branch of pydarnio gets the fix. This is because I can't see how to install the fixed pydarnio such that the realtime environment sees that version of it
@alexchartier
I've added two more files from 20211006 to the google drive link above, that I just tested. One is an hdf5 site file, and the other its dmap equivalent. I started with the hdf5 file, then converted it using pyDARN v1.1.0 since that's what we're using at site for all of our conversions, so I know if it converts with that then it has to be fine. It's good enough for what we upload, at any rate. I then converted it with pyDARNio on the branch I created, and then compared the two files, which were basically identical. The only difference was the 'combf' field in each record, because I gave two different paths to the hdf5 file when I converted two different ways.
With regards to "full conversion", BorealisConvert converts record-by-record, so I think that your conversion was probably just fine, even with a small data file. If it converts one record correctly, it should do the same for all records (ideally).
As for code that I used, I essentially used the exact same code as you just posted to generate my files. It was functionally identical. To generate the files with pyDARN v1.1.0, it was exactly the same thing but pydarn.BorealisConvert instead of pydarnio.BorealisConvert.
To compare DMap files, I used the following code:
import numpy as np
import pydarnio
files = ['/home/remington/sample_data/dmap_generated/pydarnio.dmap',
'/home/remington/sample_data/dmap_generated/20211006.2132.00.sas.a.rawacf.dmap']
SDarn_read_1 = pydarnio.SDarnRead(files[0])
SDarn_read_2 = pydarnio.SDarnRead(files[1])
rawacf_data_1 = SDarn_read_1.read_rawacf()
rawacf_data_2 = SDarn_read_2.read_rawacf()
for i in range(len(rawacf_data_1)):
record_1 = rawacf_data_1[i]
record_2 = rawacf_data_2[i]
for key in record_1.keys():
val_1 = record_1[key]
val_2 = record_2[key]
if not np.array_equal(val_1, val_2):
# if key == 'combf':
# continue
print("Different data:"
"\ti = {}\n"
"\t\tkey = {}\n"
"\t\t\tval_1 = {}\n"
"\t\t\tval_2 = {}"
"".format(i, key, val_1, val_2))
P.S. You can install a package from a branch with pip using pip install git+git://github.com/SuperDARN/pydarnio.git@rawacf_dmap_conversion
. Do this within borealisrt_env and your realtime conversion should work.
@RemingtonRohel Thanks I think it is working now so I will close the issue. I recommend your fix be merged into the main pydarnio, along with a test routine based on those example files you uploaded and those code snippets. I can help implement that testing code, but I'm not sure where the examples files should be kept. Note: I was not able to install the package from a branch as you suggested. I had to change the following line in borealisrt_env/pyvenv.cfg : include-system-site-packages = true
BUG
When trying to convert Borealis HDF5 rawacf.hdf5.site files, I get the following error:
The file cannot be converted to SDARN DMap rawacf due to the following error: Increased complexity: Borealis rawacf file record 1633013160047 blanked_samples [ 0 72 96 160 176 208 216] does not equate to pulses array converted to sample number [ 0 9 12 20 22 26 27] * 8
On inspection of those arrays, they should be the same (i.e. each number in the first list is 8x the number in the second list). Therefore, the error message is incorrect.
On further inspection, the cause of the error message is around line 424 of borealis_convert.py: ... sample_spacing = int(record['tau_spacing'] / record['tx_pulse_len']) normal_blanked_1 = record['pulses'] * sample_spacing normal_blanked_2 = normal_blanked_1 + 1 blanked = np.concatenate((normal_blanked_1, normal_blanked_2)) blanked = np.sort(blanked)
...
I don't understand why "normal_blanked_2" exists, or why it is concatenated into "blanked" If we just did blanked = normal_blanked_1 instead, then it seems everything would work fine.
Priority
Example of the bug
from pydarnio import BorealisConvert filename= '20210930.1446.00.wal.0.rawacf.hdf5.site' borealis_filetype = 'rawacf' dmap_filename = '/borealis_nfs/borealis_data/daily/20210930.1446.00.wal.a.rawacf.site' slice_id = 0 borealis_converter = BorealisConvert(filename, borealis_filetype, dmap_filename, slice_id, borealis_file_structure='site')
pydarnio.exceptions.borealis_exceptions.BorealisConvert2RawacfError: The file cannot be converted to SDARN DMap rawacf due to the following error: Increased complexity: Borealis rawacf file record 1633013160047 blanked_samples [ 0 72 96 160 176 208 216] does not equate to pulses array converted to sample number [ 0 9 12 20 22 26 27] * 8
Attempts
What have you tried already to fix it?
Tried commenting out the normal_blanked_2 stuff, got a different error: pydarnio.exceptions.borealis_exceptions.BorealisConvert2RawacfError: The file cannot be converted to SDARN DMap rawacf due to the following error: 'agc_status_word'
What have you tried to get around it?
Data Location
to be uploaded here
Please provide other pertinent details about this feature: Ubuntu linux