Closed jds485 closed 1 year ago
Here are example results from Urban and Forest. There are clear differences in feature importance rank, and also direction for some attributes (inverse pattern for TOT Clay).
Urban (TOT > 75%)
Forest (TOT > 75%)
I'm curious what reaches are in each of the spatial data splits (land cover, physio region). I can make a map target for these splits. Would you prefer as a separate issue?
I'm curious what reaches are in each of the spatial data splits (land cover, physio region). I can make a map target for these splits. Would you prefer as a separate issue?
Unless you have a preference, I think that'd be helpful so that I can focus on the code changes in this PR.
Okay, I'll make a separate issue to map these
Thanks, Lauren! Good call on not running the target computations. I forgot to add time requirements in my initial post. The 3 full-dataset SHAP computations take a total of about 15 hrs to run + another 4 to make plots and summarize results.
Why are the intermediate credentials targets omitted here? Because you discovered generating these targets is not the time-limiting step?
The credentials targets that I deleted were all redundant because they were dependent on targets whose functions have a generate_credentials()
call at the end.
This PR provides functions that split our dataset into:
1_fetch additions:
2_process additions:
p2_TOT_lc_physio_attrs
that contains only the TOT land cover classes and physiographic identifiers (binary for each region) for each SC observation.4_predict additions:
setup_shap_data
that completes the dataset splits and returns a list with one element per split. The target that uses this function isp4_shap_data_splits
in 4_predict_plots.R.get_shap_subset
that filters the full dataframe of SHAP values for only the rows in each data splitget_shap_dir
that gets the output file directory based on the data split namegenerate_credentials
_targets.R additions:
I also addressed #216, but do not have a solution to the problem. I added a function in
generate_credentials.R
that checks if the most recent credentials file was generated in the last 10 minutes. As noted in #216, the majority of the time is spent loading thedummy_var
target that is used to trigger when the aws credentials should be built. As part of this process, I realized that many of the aws credentials targets could be deleted because the last line in the dependency target's function callsgenerate_credentials()
. That helped to reduce the overall time spent on these aws credentials steps. I recommend that in the future we regenerate credentials as part of functions instead of as separate targets.Closes #220, #216