Closed mts299 closed 6 years ago
Hi Marina - The "error mapping beams" message appears when the GridTableMap function called by make_grid returns -1 instead of 0:
https://github.com/SuperDARN/rst/blob/master/codebase/superdarn/src.bin/tk/tool/make_grid.2.0/make_grid.c#L869 https://github.com/SuperDARN/rst/blob/master/codebase/superdarn/src.lib/tk/gtable.2.0/src/gtable.c#L555
Unfortunately my memory is hazy and the comments I added don't explain why this might happen. Probably failures in GridTableFindBeam or GridTableAddBeam. I totally agree the error message should be more informative.
When using the -tl
and -i
flags with make_grid I would probably recommend setting each of those to the same value (rather than 100 s for one and 120 s for the other). If that's not causing the seg fault, could you try setting the -vb
verbose flag to see which record the software is breaking on? That way we might be able to isolate whichever operating parameter or data value is causing the issue.
Hi Evan,
Thanks for the quick reply. I will take a look at the code and let you guys know if I find anything.
Sorry, that was a typo. Both options are set at 120, and I still get a segfault. With verbose I get output that looks like this: Storing:2007-1-17 13:46:0 13:48:0 pnts=0 (10) Storing:2007-1-17 13:48:0 13:50:0 pnts=0 (10) Storing:2007-1-17 13:50:0 13:52:0 pnts=0 (10) Storing:2007-1-17 13:52:0 13:54:0 pnts=0 (10) Storing:2007-1-17 13:54:0 13:56:0 pnts=0 (10) Segmentation fault (core dumped)
I tried looking through the file with dmapdump around 13:56 but nothing sticking out except at 13:58 the scan flag = -1, however, this happens at earlier times as well.
When I tried with the -ns option which is to exclude the non-zero scan flag data, it still gets an Error reading in the file and if I add -ns -tl 120 then it segfaults.
I am kind of curious to know -tl allows the file to be read in where other options seem to stop it right away with an error.
I don't see any seg faults for the HAN file on this day, but calling make_grd -vb -cn A -xtd 20070117.0002.00.han.fitacf > han.grdmap
does give an "Error reading file" message originating here:
Setting the -tl 120
flag avoids this error check and lets make_grid continue as normal - not sure why the number of beams in a "scan" should affect anything.
If I do not include -tl then I get the "Error reading file" which I assume is what you found.
If you include -tl and -minrnge you might get the segfault. I also noticed if I don't use the -minrnge option I don't get the segfault but I figure that is because this line never gets touched: https://github.com/SuperDARN/rst/blob/82b4ac03873b521b4de51f7ad20a961e92a544ff/codebase/superdarn/src.bin/tk/tool/make_grid.2.0/make_grid.c#L184
Which means the null pointer set at ptr->bm[bm].sct may cause other problems later in the process.
I am not sure why the number of "scans" should affect anything either. Still trying to figure out why sct is set to null on one of the beams.
The error "Error reading file" occurs actually because this line never gets triggered.
https://github.com/SuperDARN/rst/blob/82b4ac03873b521b4de51f7ad20a961e92a544ff/codebase/superdarn/src.lib/tk/fit.1.35/src/fitscan.c#L218 because the scan flag starts at -1 it never changes to 1. Thus the beam count goes past 1000, causing an error. Should we not try to handle this scan = -1 a bit more gracefully, if this is a common thing that occurs with han (and potentially other radars) files could we implement a simple check to see if the scan flag has changed to then detect a full scan has finished?
Also, I noticed the option -ns is not used in the FitReadRadarScan function, should this not be incorporated in the function such that the scan = -1 scans can be skipped? Again this might help to avoid the above Error from occurring.
I will keep digging to figure out the segfault issue when I use -tl and -minrng together.
Checking against the absolute value of prm->scan
appears to solve the problem. I've created a fix_negative_scan_flags
branch where this change has been made in fitscan.c, make_grid.c, and several other binaries/libraries. Hopefully that will solve this issue with negative scan flags - give that a try and let me know if it works.
I actually don't know why one would want to ignore data from beam soundings with a negative scan flag - I believe @ecbland explained this is simply used to indicate the scanning sequence is in the opposite direction?
Thanks Evan for making this branch. Now when I don't use -tl or -ns flag I don't get an "Error reading file". However, if I still use the -minrng option I still get a segfault.
I know where the segfault is occurring and where the bm[].sct is getting set to null, just need to figure out why.
I found out where the segfault is occurring. In the han file, there is one small section of the data (right after 14:00 UT )where all the scalar values are equal to 0, and there are only three arrays (So something might have happened to the data collection or processing the fitacf file). The cause for the segfault is a combination of not checking data values and return values. When nrang = 0, RadarAddBeam: https://github.com/SuperDARN/rst/blob/f65f9192435979957aa392c349e70ac17b30c157/codebase/superdarn/src.lib/tk/scan.1.7/src/radarscan.c#L121-L148 sets bm->sct = NULL but then never changes it or returns a NULL value because it only checks for when nrang != 0 and when s == -1 which only happens if bm == NULL. Then because nrang = 0 this piece of code never initiates: https://github.com/SuperDARN/rst/blob/f65f9192435979957aa392c349e70ac17b30c157/codebase/superdarn/src.lib/tk/fit.1.35/src/fitscan.c#L157-L172 So the next time it will access bm->sct (a null pointer) is when -minrng is set: https://github.com/SuperDARN/rst/blob/82b4ac03873b521b4de51f7ad20a961e92a544ff/codebase/superdarn/src.bin/tk/tool/make_grid.2.0/make_grid.c#L179 That causes my segfault.
So a simple software solution is to check if nrang==0 say at this line: https://github.com/SuperDARN/rst/blob/82b4ac03873b521b4de51f7ad20a961e92a544ff/codebase/superdarn/src.lib/tk/scan.1.7/src/radarscan.c#L132 change it to if (s==-1 || nrang == 0) Just to make sure the memory is properly handled. And an error return value checks around this area: https://github.com/SuperDARN/rst/blob/82b4ac03873b521b4de51f7ad20a961e92a544ff/codebase/superdarn/src.bin/tk/tool/make_grid.2.0/make_grid.c#L887 to throw an error message or maybe warning to the user that this file has some bad data.
I am curious about how we want to handle this, it seems at one small moment there is a blackout in the data, however, there is data in the rest of the file. Do we assume the rest is bad?
I still need to check if the rawacf has this blackout or it occurred when converting the rawacf to the fitacf.
@mts299 - the 20070117.1401.00.han.rawacf
file I have is only 4.0K in size, and both dmapdump and make_fit fail for me when trying to process it. We may need to blacklist that particular rawacf file.
Hi Evan,
Looks like I only have data for han up to 14:00:09 and then more data at 16:02, so there is a gap between 14:00-16:02 in our data records.
Is 14:01 the only 4.0 K file in your records?
@mts299 - yes, 14:00:09 is the final record in our 20070117.1201.00.han.rawacf
file. Short of putting extra debug statements in make_fit or dmapdump, I can't even see the record contents of the 1401 rawacf file.
Interestingly, the origin lines throughout the 1201 rawacf file say: "/home/dproc/bin/dattorawacf 2007011712f.dat" - which makes me wonder if a rawacf file was produced at the radar site or if there was only a dat file to begin with.
I'm leaning towards 20070117.1401.00.han.rawacf
needing to be blacklisted - maybe @kevinkrieger or some other members of the data distribution WG can chime in.
@egthomas, I agree. I will talk to @kevinkrieger about it today.
For the "Error mapping beams" problem I was getting with Ranken data, I found that the commissioning date in the radar.dat file is 20070501 thus I am trying to look at data earlier than the commissioning date. Not sure if we want an error message to say that?
From our records, it shows Rankin Inlet was operational in 2006 May 16th. I am going to analyze the data from then until 20070601 to see if the data is good. If so I will ask Kathryn what she wants to do then we may need to remove those files or change the commissioning date in the radar.dat file.
@mts299 - I just checked the BAS and USask mirrors and the 20070117.1401.00.han.rawacf
is not present on either. So it looks like that file was already removed but not officially blacklisted.
The 20070117.1001.00.han.rawacf
file is also missing from both the mirrors, which we have a copy of locally, but just crashed my computer when trying to use dmapdump to view its contents. So I think both of those files need to be officially blacklisted for the benefit of anyone who might still have them laying around their data servers.
Hi @egthomas and @mts299
The 20070117.1401.00.han.rawacf file is officially blacklisted. It was blacklisted on November 7th and an email was sent out for "Issue #26".
@egthomas - @kevinkrieger and I checked as well and saw that it was blacklisted last November, however, we have not updated that fitacf since 2015.
Looks like the problem is on our end for data management which we are currently working on.
@kevinkrieger - Sorry, my mistake! Those files are also blacklisted on the BAS mirror (https://api.bas.ac.uk/superdarn/mirror/v3/files/blacklisted?contentType=checksumreport). Looks like we need to clean up our own rawacf archive here at Dartmouth.
@egthomas No problem, thank you for helping with our issues!
@kevinkrieger - reading through that DDWG email chain, did you guys perform a similar analysis on the dat files as well? It sounds like such a sweep might help solve the issue I raised in this repository in #116 (early dat files not processing correctly into fit files, but are correctly handled by dattorwacf which can then be processed into fitacf files).
edit - actually, that issue with old dat files had nothing to do with make_fit but instead buffer overflows produced by dattorwacf and test_raw
@egthomas We did not, the analysis was done using @kkotyk dmap python library - I'm not sure that it handles dat files, but I'm open to ideas (it's a smaller dataset so it shouldn't take as long)
After just coming across this, I'm wondering if there is a way to use make_grid on a file with -1 scan flags without resorting to ignoring scan flags with -tl
. I have found that when using -tl
with non-standard integration times and a specified start time, the actual start time used by make_grid can vary substantially.
As a thought, are there any potential consequences to overwriting a fitacf so that every '-1' scan flag is now '1'? Other than preserving the scan direction used.
@billetd - I've created a pull request (#132) which checks against the absolute value of scan flags to address that issue in make_grid. Although having thought about this some more over the last few weeks, it seems like there may be 2 separate cases where negative scan flags occur: 1) a backwards-scanning radar mode, or 2) a radar mode such as a camping beam on a Stereo radar that was never intended to be gridded. The first case is probably safe to grid, but you may want to be careful with the latter.
Thanks Evan, this also fixes it for me!
@egthomas - Sorry for the delay in response been busy with other things. Does case 2 have another way of being detected? I don't want to cause more bugs even if it is a rare case.
Also, side note and maybe we can close this issue. The Rankin Inlet error I was getting was due to the fitacf file being earlier than the commissioning date in the radar.dat file. I need to process the data from when the radar was up running to see if Kathryn wants to keep that data which would just lead to us changing the radar.dat file or we blacklist all the earlier rawacf files.
Either way, I can create a separate issue for that problem since it has more to deal with the radar.dat file than make_grid.
@mts299 - I think we addressed the original issues raised here (scan flag error, blacklisted files). Do you think this issues is safe to close?
Did we want to keep this issue open until we discuss the scan flag at the conference? I could make another issue that is more focused on the scan flag.
I think that merits a separate issue, since it isn't strictly an error in the make_grid procedure.
Sounds good to me!
Hi,
I am a new programmer/ data analyst at SuperDARN Canada, and I am currently working with the RST code to generate convection maps.
I noticed for the data set 20070117.C0.rkn.fitacf returns an error message "Error mapping beams". Any insight on why it would return this message? dmapdump shows there is data in the file. Also, the return value for this error is 0, do we not want something more meaningful?
Another problem is reading in 20070117.C0.han.fitacf: ~> make_grid -xtd -cn A -i 120 -minrng 2 -vemax 1000000 20070117.C0.han.fitacf > 20070117.han.A.grid Error reading file.
Which I assume is because there are scan flags = -1 so when I change the options to deal with it as instructed in issue #85 , I get a segfault instead: ~> make_grid -xtd -cn A -tl 100 -i 120 -minrng 2 -vemax 1000000 20070117.C0.han.fitacf > 20070117.han.A.grid Segmentation fault (core dumped)
Cheers, Marina