PSU-CSAR / bagis-pro

BAGIS for ArcGIS Pro
4 stars 1 forks source link

Fire Disturbance Data Processing #50

Open jdduh opened 1 year ago

jdduh commented 1 year ago

See #49 for fire data sources. Please note the spec of the fire data might continue to evolve.

  1. Clip the NIFC ArcGIS Server feature services to the AOI layers.gdb - firehistory and firecurrent
  2. It appears that the output featureclasses contains multipart features, i.e., each attribute record represents a unique fire regardless if the original fire polygons are divided by clip tool
  3. Delete all polygons in the firehistory layer that have a FIRE_YEAR prior to 1981
  4. Retrieve the fire year from the attr_FireDiscoveryDataTime in the firecurrent layer
  5. Create standard YEAR field in both firehistory and firecurrent and merge them into nifcfire layer for subsequent processes
jdduh commented 12 months ago

A sample AOI (13309220_ID_USGS_Mf_Salmon_R_at_Mf_Lodge) is uploaded to the ftp server, under the BAGIS_aois\Fire_Disturbance_AOIs folder.

lbross commented 11 months ago

What other fields do we want in the new nifcfire layer? Just the year?

jdduh commented 11 months ago

Yes, just the year. There should be a step 4.5 to remove duplicate fires in firecurrent and firehistory (keep the duplicates in firehistory and remove them from firecurrent). firecurrent is a temporary/working dataset that is waiting for validation, It still contains fires that were validated and had been moved into firehistory.

4.5. Find the most current year in the firehistory and delete fire records prior to that year from the firecurrent before merging.

Notes: it's unclear if the validation was done on a fixed timetable or case-by-case for individual fires. There should be a logic to check if a fire record to be removed from firecurrent actually exists in firehistory. Fire boundaries of the same fire in firecurrent and firehistory are very likely to be different (i.e., the geometry was altered during the validation).

lbross commented 11 months ago

It appears we can join firecurrent and firehistory using firehistory.IRWINID and firecurrent.polyIRWINID. Let's use IRWINID {63FFB3A1-2BC3-4E08-9C97-B17F9A05F22B} from the 13309220_ID_USGS_Mf_Salmon_R_at_Mf_Lodge AOI as an example. There is a 2021 record for this fire in firehistory and a record in firecurrent. Should we exclude the record from firecurrent as a duplicate?

Edit: I recommend adding irwinid to the nifcfire layer. It will be helpful for debugging purposes so that we can find the record(s) in the source tables.

jdduh commented 11 months ago

Please use and keep IRWINID in the fire layers.

https://www.nwcg.gov/publications/pms936-1/data-preparation/incident-information

lbross commented 8 months ago

Maintaining google drive document with geoprocessing steps

lbross commented 8 months ago

Some details on the data I have found. It looks like they didn't start using IRWINID until 2016, thus there are many records in firehistory with a null IRWINID. This shouldn't cause a problem with the merge because the oldest records I have found in firecurrent are from 2020. However, I added INCIDENT to the fields in the new fire layers so we have a way to identify them. I don't know if there is another primary key we could use to identify the older records? Also, I have found multiple duplicates in firehistory where records have the same irwinid. I don't know if this is a problem?

jdduh commented 8 months ago

The fire ID (primary key) is for reconciling the duplicate records in the historical and current fire datasets. I have to assume all the data in the historical dataset is correct, unless we spot obvious errors.

lbross commented 8 months ago

I don't see a field named fire ID in either layer. There is a field called UNQE_FIRE_ID in firehistory that appears to be truly unique, and a field called attr_UniqueFireIdentifier on firecurrent. These keys appear to match between the two layers and is used earlier on the firehistory layer. The oldest records with this field populated are from 2006. Maybe we should use this instead of IRWINID to eliminate duplicates between firehistory and firecurrent?

The duplicates I find in the historical fire dataset are interesting. Take a look at this IRWINID on the firehistory layer in 13309220_ID_USGS_Mf_Salmon_R_at_Mf_Lodge: {30BCC069-947B-43F3-B139-0761FD79C4BD}. It is the same fire because the name and area are the same. However, the records have different UNQE_FIRE_ID and were generated by different agencies: USGS and BLM. I doubt that NWCC will want to duplicate the area of these fires in the statistics.

We could add a step to de-dup them using the IRWINID, but as I said previously, IRWINID only goes back to 2016.

jdduh commented 8 months ago

I meant the IRWINID. We only need IRWINID for reconciling the duplicates between firehistory and firecurrent. I will verify with NWCC how they want to handle areas that were burned multiple times in the same time period (annual or 5-year). If they are fine to not count the reburned areas multiple times, then we can just dissolve all the burn area polygons into one flat polygon. This should automatically take care of the duplicated records in the firehistory layer.

lbross commented 8 months ago

In this google slide deck there are references to USGS MTBS raster data. Are these steps we should be adding to Fire Disturbance Data Processing? If so, is there a spec beyond clipping the WMS to the AOI?

lbross commented 8 months ago

Do we want to fold these data sources into the data source architecture that we are using for the other BAGIS-Pro layers? If so, then I will need a heading and a description. Do we want to add them to the existing batch tool title page? Or have a separate title page for fire data sources?

jdduh commented 8 months ago

NWCC has confirmed to not double/multiple count the burned areas for the same reporting period (i.e., annually the last 5, 10, 15, etc. years), Dissolving all the merged polygons for the reporting period should take care of the duplicate polygons issue that's presented in the firehistory data.

There are different statistics and maps for MTBS (burn severity) data. Basically just % area of AOI with high, moderate, and low burn severity using the same summary period as the fire disturbance (NIFC) maps.

I will provide the BAGIS-Pro data source information later. Yes, please use the same architecture. We probably won't merge the reports at this point. We might combine both the original report and the fire disturbance report in future release of the Batch tool.

lbross commented 8 months ago
  1. Dissolving the merged polygons: Does this mean we will have multiple versions of nifcfire? The original plus one for each reporting period (i.e., annually the last 5, 10, 15, etc. years) with the polygons dissolved? Does this also mean we no longer need to de-dup between firehistory and firecurrent since we are dissolving the polygons?
  2. So from a data processing perspective, we just need to clip the burn severity data to the AOI?
  3. Data source architecture: OK

Wonder if you have experience working with WMS data in ArcGIS Pro? I try to add the WMS server with the url provided and Pro cannot connect. https://apps.fs.usda.gov/arcx/rest/services/RDW_Wildfire/MTBS_CONUS/MapServer/WMSServer/. I can add the MapServer, but it seems like the projection is off and Pro cannot clip a raster from a MapService.

Do we want to consider a new FGDB for the fire data if there will be a large number of layers? The argument against this would be that ebagis cannot upload it but we're not using ebagis anymore.

jdduh commented 8 months ago
  1. The fire footprints in the firecurrent and firehistory might be different. We ignore the firecurrent version if the footprints also exist in firehistory. We can keep one nifcfire or multiple versions of nifcfire for the various time periods. If it's only one nifcfire, then we need to select the features for the specific time periods in the analysis and mapping (e.g., through a definition query). For a time period, we only need to calculate "newly burned area". For example, the newly burned area for the last 10 years will have fire footprints for the last 10 years. We will need to "dissolve" or "flatten" the polygons to get the footprint for the burned area. Dissovle might not be the correct tool.
  2. I haven't looked into the USGS burn severity data yet. As you mentioned, we might not be able to clip a raster from a WMS. If this is the case, then I need to download all the data and create our own imageservices. For these raster burn severity rasters, we need to "flatten" the rasters for the specified time periods. I will check with NWCC to find a rule (e.g., rank the severity from 1 to 3, find the max or min for each pixel from all the rasters clipped for that time period).
  3. We can create a new fire gdb to store the fire layers. It might be easier to keep track if a fire report has been generated for an AOI.
jdduh commented 8 months ago

See #21 for the spec of the fire report summary page and the data sources headings and descriptions of the fire data.

jdduh commented 8 months ago

Finding newly burned area from nifcfire:

  1. Add a new field (or use any existing field) and set the field value to the same (e.g., 1)
  2. DISSOLVE the nifcfire by that field The steps above will return one polygon depicting the common shared area of all the polygon features in the nifcfire featureclass.

Here is an alternative method that can count areas by the frequency of overlapping features (i.e., burned multiple times for the summary period).

  1. UNION the nifcfire
  2. COUNT OVELAPPING FEATURES on the output of step1 (the COUNT_ field shows the number of overlapping fires)
  3. SUMMARY STATISTICS on shapearea using the COUNT as the Case Field generates the areas of the newly burned areas by the number of time the areas were burned.
jdduh commented 8 months ago

Adjacent years rarely have overlapping fires. However, if the time span of the period is large, then the overlapping burned areas could exist.

lbross commented 8 months ago

Notes from March 2024 meeting

  1. Publishing the Fire Disturbance data in .csv format for AOI statistics is the primary objective to be completed by September 2024. The data spec is available here on the google drive.
  2. We can manage questions for both data preparation and the output on this issue
  3. We will add baseline and increment parameters to the batch settings parameters to manage the 5-year increment statistics as requested in the spec. For example: if the baseline is 1994, the increment statistics would be: 1994-1999, 1994-2004, 1994-2009, 1994-2014, 1994-2019, 1994-2023.
lbross commented 7 months ago
  1. For statistics where there is no data, for example no fires during a given time period, should the cell be populated with 0 or left null?
  2. What data layer will we use to determine the forested area in an AOI?
  3. I added some notes and screen shots to the Data Processing document. Starting on p. 8. I have finished de-duping the history and current layers
jdduh commented 7 months ago
  1. Record zero when there are no fire records during a given time period. For annual stats, NULL is recorded when the most recent years' NIFC fire data or USGS MTBS data layers are not available. For example, we only have MTBS up to 2022, so the 2023 stats should have the burn severity stats as NULL. We will need a way to record the fire data availability on the data source page (see #21), something like "Wildland Fires - Historic 1984 - 2022", "Wildland Fires - Current 2023", "Wildland Fires - Burn Severity 1981 - 2022".
  2. Use the nlcd forested layer that the batch tool created in the basin analysis report
  3. Based on a discuss with NWCC, we need to design an interface that allows users to not recalculate some of the statistics. I suggest adding some radio buttons / checkboxes to the GUI (between the increment (in years) and output folder controls). The radio buttons are: 1) Calculate all time periods. 2) Calculate selected time periods. Under the calculate selected time, add the following checkboxes: 1) annual (from YYYY to YYYY), 2) time periods (the most recent X periods). The default values are: calculate all time periods, all checkboxes checked, from 1984 to current year (e.g., 2024), and the most recent MAX time periods.
  4. Record the % newly burned area for the estimated derived from both NIFC and MTBS data. These two values could be different. This update is documented in https://docs.google.com/document/d/13ip8w16rglsIEex3yszv3FK60vv6Be00DpR9AXLDXJ0/edit?usp=sharing
lbross commented 7 months ago
  1. Fire data availability on the data source page: There are a couple of ways to go on this. The 'Description' is maintained on the server so we could include the year in that and then update it when appropriate. We also have the 'shortDescription' on the server that could include the years as you specified above. It would need to be updated manually each year.

item 3. I have an updated sample GUI on the data processing document (p. 8) and some descriptions/questions on following pages. Please review and let me know if I have captured the requirements.

lbross commented 7 months ago

Add step to record max data date from the NIFC service in the analysis.xml when this layer is created.

lbross commented 7 months ago

We must have talked about this, but I can't find the answer. In firehistory there are multiple irwinid's with > 1 record. Looking at the attribute table it seems that these records have the same irwinid but different sources. Do we need to resolve this as part of creating the nifcfire layer? I'm not sure what tool to use because if I dissolve on irwinid and year, it combines many unique fires with null irwinid's. For the annual reports, it will report 2 new fires instead of one if we don't handle this.

lbross commented 6 months ago

I've come up with a set of GP tools to eliminate the duplicate records that come from firehistory with the same irwinid's but different sources. See p. 14 of our design spec. Is it okay to make these modifications to the nifcfire layer? Or would you prefer to retain the original layer with the duplicates and save these modifications to a separate file. We need to do this to get an accurate fire count for the statistics.

jdduh commented 6 months ago

We probably can reuse/overwrite the nifcfire layer and don't retain the original layer. There seems to be a separate historic wildfire data source (https://giscenter.isu.edu/research/Techpg/HFD/). I'm in communication with their PI to see what types of processes they did to clean up (e.g., remove duplicates) the fire perimeter data. HFD just released a new version containing 2023 fires. I will keep you posted if there is any benefit to use the HFD. If we decide to use HFD, then we probably need to publish the "static" dataset on NRCS AGOL.

jdduh commented 6 months ago

We have potential a new historic fire dataset - HFD Historic Fire Since 1950. This dataset seems to be updated more timely than NIFC. As of today (April 2024), the 2023 fires are updated. The dataset doesn't use IRWIN ID, but has all the extra duplicate fires removed. If we decide to use this dataset, then we only need to extract the current year's fires from the NIFC Current Fire feature service in the analysis. If NWCC decides to switch to this fire dataset, then we probably can set a flag in the definition file to indicate the source of the historical fire data in the analysis. The flag could be "NIFC" or "HFD."

Feature Service: https://services1.arcgis.com/z5tlnpYHokW9isdE/arcgis/rest/services/HFD_HistoricFiresDatabase/FeatureServer Metadata: https://giscenter.isu.edu/pdf/PDF_NASA_RECOVER2/Metadata.pdf

lbross commented 6 months ago

I clipped the HFD layer to the aoi boundary that I have been working with. The perimeters of some of the fires are different. I'm not sure which one is more accurate. Let me know if you'd like me to post a geodatabase with the clipped layers so you can take a look. There is an unqe_fire_id in the HFD data which appears to be the same as the irwin id for some of the more recent fires. But it isn't consistently used. For merging with the NIFC Current Fire data, we would have to query to HFD layer to get the latest year and then only extract the currentfire data from the following years. We are assuming that the data is added a complete year at a time. I would still like to include a unique id in the merged file if we can. It makes it easier to identify data issues.

lbross commented 6 months ago

Some thoughts about adding HFD as an option:

  1. We need a way to automatically define when HFD rolls forward to the next year. Assuming that we want to get everything more current than HFD from nifc current_data. When does the new year start?
  2. We should revisit the schema of the nifcfire file to make sure it is compatible with HFD data. The field names in nifcfire are derived from nifc data My recommendation is to continue with NIFC for this initial release and revisit adding HFD, if they still want it, this fall.
lbross commented 6 months ago

This question is regarding the fire_analysis.json file. Please confirm that lines 1 - 4 should be recorded when the data retrieval occurs. I am referring to the Fire Statistics Time Period Calculator.

jdduh commented 6 months ago

I believe we need the values of the first four (and the others in the spreadsheet) to recreate the analysis (Note: the tool currently doesn't have the option to recreate an "old" report). So, yes, we need to put them in the fire_analysis.json file. Given than there is another parameter that we use to calculate the data_retrieve_start_year, we can either record the data_retrieve_start_year or a "retrieve_duration" value. It probably is more straightforward to just record the data_retrieve_start_year, but either one is fine.

lbross commented 6 months ago

Because the data retrieval is decoupled from the report generation, I need to update the values in the spreadsheet at different times. The four items specified in the comment immediately above will be recorded/updated when the data is retrieved. The remaining three items will be updated each time the reports are generated. I wanted to call this out because those first four values reflect the values from the local data layers instead of what might currently be available from the server. Especially if the reports are run later than the data is retrieved.