Plots and outputs for "Data_Capture_Validation"

MatthewMeier commented 5 years ago

@mhallerud @banderson1618 and myself have created and been working on the validation of BRAT using data capture events/beaver dam locations through field or computer imagery methods. We use this data in order to validate the BRAT outputs for reports that @wally-mac writes up. We would like to implement some additional pieces to the outputs such as plots that relate the observed dam density vs the model density to get as close to a 1:1 ratio as possible but as little under estimate events as possible. For an example of what we have done in the past for validation parameters refer to the Yakama report on box pgs 34-38 (I am not including the link to the report because it is contract confidential). You can find it on box though. Here is the documentation about the outputs from the tool. *Note some of the outputs that were originally created by this tool like dam counts have been shifted over to the BRAT Table portion of the tool and so this is can be referenced now or be eliminated from the "Data_Capture_Validation" script. references to this data include.

mhallerud commented 5 years ago

Hey @MatthewMeier, I'm not entirely sure what you're asking for here but let me know if I'm on the right track: Are you looking to automate the table on page 36 of the Yakama report? What about also adding an observed vs. modeled plot (see below) with each point representing values from an individual stream segment? My reasoning for the plot is two-fold: First, observed vs. predicted plots are standard statistical visualizations of model validation. Second, I find a single plot that summarizes model performance a bit simpler and more intuitive. This obviously wouldn't replace the maps, it just might be a nice addition to summarize what is shown in greater detail and complexity on the maps.

mhallerud commented 5 years ago

@MatthewMeier Do you also want the symbology added to the relevant fields?

MatthewMeier commented 5 years ago

@mhallerud the plot that you illustrated is what we would like to have with some minor changes to it. You are right that we want a trend line but it needs to represent a 1:1 ratio not what is the trend line for the data. This line would represent the model being exactly on with what is actually true. However since this is a capacity model we would like all of the dots to be below the line or right on the line and minimal points above the line. It would be beneficial if we could change the symbology of the points above the line to red and below the line to blue. A in plot legend would be helpful as well. Refer to the following plots that @joewheaton has created with the GCD project. Notice that he has a pie chart off to the side. This might be something that the cartographers end up doing but in the meantime get the plot changes done and the legend. As for the table I would say have the tool just output a excel document that holds all of the data in the attribute table for now and we can visit the specifics on this later. @wally-mac chime in with anything else you would like but I think this covers some of the checklist.

MatthewMeier commented 5 years ago

@mhallerud, @braden should have created a new folder in the inputs folder for data capture events(DCE). We will want this optional analysis under a new folder in the outputsAnalysis"Inter-Comparison" folder.

wally-mac commented 5 years ago

@mhallerud, @MatthewMeier and @banderson1618 here is a draft manuscript that has a good example of what we are looking for. See Figure 9.

wally-mac commented 5 years ago

Try the link now... it should work.

MatthewMeier commented 5 years ago

Yes that adds some good insight into what they should look like. I think that I would need some personal insight into the figure on the left to know what to look for and why it isn't a linear relationship. I think this would be something we can discuss in person. Thanks for the direction @wally-mac

mhallerud commented 5 years ago

@wally-mac Quick question, would you be able to send me the R code for the models these plots are based on so I can better understand exactly how the data is being subset in the cross-validation? It's a little bit unclear in the writing.

MatthewMeier commented 5 years ago

@mhallerud we are getting pretty close to needing this for the North Fork Burnt project. Are you ready to do a preliminary run through of this plot with some data for the report?

mhallerud commented 5 years ago

@MatthewMeier Yes, it's all coded up I just haven't had a chance to do a thorough check yet before I push it to the master branch.

MatthewMeier commented 5 years ago

@mhallerud I will send the data that I would like you to run it on asap. I need to look over some past entries into the DCE and I will send it to you. You can test it on this. The run is fairly recent but it has been a mad dash to get the improvements done for BRAT so I wouldn't doubt if you will have to work through some kinks to get this one through. Keep posted

mhallerud commented 5 years ago

Here's the latest rendition of the plot:

predicted_vs_expected_plot

The "missing" points were actually just points that were covered by other points with equivalent values. I added a (small) element of randomness to the points so that they don't cover each other up like that, similar to what geom_jitter does in R's ggplot package. Let me know if this is ok and if you want any other changes made.

wally-mac commented 5 years ago

@mhallerud, your plot looks good to me. @bangen and @joewheaton do you have any suggestions for improving the above plot?

mhallerud commented 5 years ago

@wally-mac @MatthewMeier Do you think the 1:1 ratio line should be renamed something more useful/understandable to managers?

wally-mac commented 5 years ago

Sure. What’s your idea?

On Tue, Dec 18, 2018 at 5:14 PM Margaret Hallerud notifications@github.com wrote:

@wally-mac https://github.com/wally-mac @MatthewMeier https://github.com/MatthewMeier Do you think the 1:1 ratio line should be renamed something more useful/understandable to managers?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Riverscapes/pyBRAT/issues/245#issuecomment-448420669, or mute the thread https://github.com/notifications/unsubscribe-auth/AU-QUig6oq2uiB-A_KANCe0pcXFemTS5ks5u6YTRgaJpZM4ZJHs- .

-- Sent from Gmail Mobile Sorry for typos and brevity courtesy of my phone.

MatthewMeier commented 5 years ago

Other ideas could include.

ratio hedge
Golden ratio -Correlation line

mhallerud commented 5 years ago

Maybe something like "Perfect Prediction" or "Ideal Correlation". A quick google search shows a lot of figures using "perfect line" or "line of perfect agreement".

wally-mac commented 5 years ago

I like Line of perfect agreement

On Tue, Dec 18, 2018 at 5:35 PM Margaret Hallerud notifications@github.com wrote:

Maybe something like "Perfect Prediction" or "Ideal Correlation". A quick google search shows a lot of figures using "perfect line" or "line of perfect agreement".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Riverscapes/pyBRAT/issues/245#issuecomment-448425069, or mute the thread https://github.com/notifications/unsubscribe-auth/AU-QUnfNVWW9crwPl5WHwYrlcnRoNk5bks5u6YnIgaJpZM4ZJHs- .

-- Sent from Gmail Mobile Sorry for typos and brevity courtesy of my phone.

mhallerud commented 5 years ago

@MatthewMeier Working on automating the electivity ratios today. If there is anything of higher priority for me to be working on, feel free to give me a call.

MatthewMeier commented 5 years ago

@mhallerud Sounds good use the following excel sheet for guidance

Riverscapes / pyBRAT

Plots and outputs for "Data_Capture_Validation" #245