Open mccabete opened 1 month ago
Quick note- the screenshots from version 3 are colored by mergeid, not fireID.
Some of the small fire starts were missing from V3
Can someone unpack this a little more for me please about what you're seeing between v2 and v3? Are we expecting those perimeter polygons to be merged? Or are we expecting that v2 has more perimeter polygons and we're not seeing them?
We do expect them to be merged. In V2, they were merged in. In V3 they are not. Julia pointed out that we were plotting from the allfire object for the V3 stuff in the post above. Below is a screenshot from the largefire file where the little fires are missing.
This suggests that V3 is making different decisions about how stuff gets merged. They might be better decisions :shrug: but I want to know where they are coming from.
There are two places that merging happens:
1) While running Fire_Forward
any fires that intersect with each other get assigned matching mergeids and the larger fire gets its hull updated to include the smaller fire. This happens in Fire_merge_rtree
.
2) In the postprocessing steps fires that have the same mergeid are combined with each other. The bulk of that is in postprocessing.merge_rows
.
So since you are having different mergeids the difference is coming from within 1.
So since you are having different mergeids the difference is coming from within 1.
I almost understand what this means, but still don't quite. Are you saying that the issue is how ID 1 got assigned?
Sorry it was in reference to the prior comment. The issue that you are seeing is from within Fire_merge_rtree
Is this the fire you are looking at:
tst = [2023, 6, 1, 'AM']
ted = [2023, 8, 12, 'PM']
bbox = [-76.510259, 52.624922, -73.801554, 53.743852]
Not exactly that, but approximately. It's based on the bounding box of the original V2 fire.
tst = [2023, 6, 1, 'AM']
ted = [2023, 8, 13, 'AM']
region = ('Fire_8495_Quebec_v2_test',[-76.510259,52.624922,
-73.801554,53.743852])
The v3 data those screenshots are from is available in my shared bucket if you want to take a peek without waiting to re-generate them, as that did take an hour or so: shared-buckets/zbecker/Fire_8495_Quebec
. It was generated following the steps in notebooks 01 to 03.
If possible, I think we should get @zebbecker, @jsignell and myself (@mccabete if she isn't too busy) together for an hour pairing session today or tomorrow. Then we can come up with a way to debug/answer these questions for future us. Thoughts? I think that would be fun
Update: the v3 perimeters we were comparing against were generated with two incorrect settings flags: CONT_OPT and FTYPE_OPT were set as "CA", they need to be "global" for the boreal region. PR 65 fixes a typo to allow this to work as intended.
With these two flags set correctly (e.g. the same way they were set in the v2 run we are comparing against), v3 merges the same way as v2.
We can leave this open with the intention that the notebooks should be augmented to include more information about settings and how jobs are configured.
With the flag issue resolved, v2 and v3 perimeters look very, very close. However, there is still one small discrepancy that I have not been able to track down a source for yet. In this screenshot, you can see that there are a few places where the blue v3 perimeters peek out from below the yellow v2 perimeters. Look near the center of the image.
@mccabete thought that this might be caused by differences in the underlying data or by sensitivity to the bounding box, so I tried to control for as much as possible, running v2 and v3 with the same flags, parameters, and datasets. (To run on the same data, I processed the tmp files that v2 saves during its preprocessing steps into the input format that v3 expects, then ran v3 on those rather than the usual input locations.) This did not change anything about how the outputs compare, so either I missed something to control for (possible!) or its not this.
I'm worried that it might have something to do with shapely versioning, as the v2 perimeters were generated in an environment created from the conus-dps
branch env.yml
file, while the v3 perimeters were generated using the environment defined in the primarykeyv2
branch. Respective Shapely versions were 1.8.4 and 2.0.1 for v2 and v3, so we did cross a major release. I am investigating this possibility further.
Other things that could result in mild differences:
1) Which pixels are in which clusters originally is determined by the sklearn
BallTree
algorithm, so scikit-learn version could be a factor.
1) The shape of the hulls with more than 3 pixels is dependent on settings.valpha
, so you would want to ensure that that is held constant.
2) The hulls are calculated using either scipy.spatial.ConvexHull
or scipy.spatial.Delaunay
depending on how many pixels they include, so scipy version could be a factor.
I guess my point is it is unlikely that you would ever get exactly the same results given the difference in library versions and prtentially in the underlying data. Overall the focus might be better spent on deciding an allowable difference in overlap and writing tests to ensure that the difference in overlap does not exceed a certain threshold. If there is an external source of fire perimeters that might be an even better way to test so that instead of testing v2 vs v3 we could test v2 vs some canonical perimeters, and v3 vs some canonical perimeters.
Thank you @jsignell!
As far as I can tell, settings.valpha
is consistent across all runs.
Running v3 with the scipy and scikit-learn versions used in the v2 runs results in the same undesired behavior in the outputs.
That leaves Shapely version. I'm having a hard time confirming that this is or is not the issue, because Shapely made a few breaking changes when going from 1.x to 2.x. Our v2 relies on 1.x behavior, but v3 relies on 2.x behavior. Both in seemingly nontrivial ways. The upshot of that is that I can't just run v3 with an older Shapely to test this hypothesis, nor can I run v2 with a newer Shapely, without significant refactoring.
(The specific issue I'm running into is that multipart geometries are no longer sequences.)
I was demo-ing the V3 code and comparing it to perimeters from V2 and perimeters from V2 that I had post-hoc corrected for merging issues. I noticed that there were some differences, specifically around little fires at the western edge of the fire. Some of the small fire starts were missing from V3. Some of them are
Version 2 (With merge issue):
![image (5)](https://github.com/Earth-Information-System/fireatlas/assets/21266199/df80fe19-a79a-4bf3-9551-d45620f30791)
Version 3 (Generated from the notebooks):![image (6)](https://github.com/Earth-Information-System/fireatlas/assets/21266199/4fdc475e-3d66-4e86-acb6-7b973020d693)
@jsignell and @ychenzgithub -- any thoughts on where this bug could be coming from?