External soil data: Gridded global data v5

CaitlinA commented 6 years ago

Convert current SSURGO implementation from geodatabase to this gridded datasource.

This will make both point and cell implementation much easier

@CaitlinA

CaitlinA commented 6 years ago

Data available here and should be downloaded to harddrive in separate and aptly named folder (i.e. BigData/Soils/GlobalSoilMap_v05)
Implementation should be similar to function/workflow "do_ExtractSoilDataFromCONUSSOILFromSTATSGO_USA"

-Look at SOILWAT2/testing/Input/soils.in -> Need to grab bulk density (<2mm), sand, clay, gravel

nip5 commented 6 years ago

The collection here: https://scholarsphere.psu.edu/collections/jw827b80n, which is our most current attempt does not inlcude gravel, is that fine?

nip5 commented 6 years ago

Also, the depths provided using that collection differ from those in the example 'soil.in' file, is this fine as well?

CaitlinA commented 6 years ago

Yes. It is not unusual for each soils dataset to have unique depths.

On Mon, Jun 18, 2018 at 11:25 AM, nip5 notifications@github.com wrote:

Also, the depths provided using that collection differ from those in the example 'soil.in' file, is this fine as well?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/DrylandEcology/rSFSW2/issues/305#issuecomment-398149978, or mute the thread https://github.com/notifications/unsubscribe-auth/AM0e_9pzzhArwVb6x7_Ox8b33-w7iAcvks5t9_CqgaJpZM4TkHRR .

-- Caitlin Andrews Ecologist Southwest Biological Science Center U.S. Geological Survey Mobile: 802-922-3494 Office: 928-556-7036

nip5 commented 6 years ago

Through turning verbose mode on, I can see that the simulations are slow because they are ran many times; it is recognized in the code that each run takes up to 60 seconds. For example, the prj4Test bundled in feature_DecadelAggregations runs the simulation 12 times, I think this is by design, and thus takes up to 12 * 60 seconds to complete. The obvious way to confirm this would be to run CONUS on the test and see how long it takes but CONUS doesn't seem to work on feature_DecadelAggregations branch. Could you check if you have time if CONUS works for you with that test project on feature_DecadelAggregations? We should see an output near the end of the run similar to this (with verbose on in the settings): @CaitlinA @dschlaep

[1] "2018-07-17 11:55:16 [run1/PID3/sc3/work0]: aggregating 'dailyColdDays'"
[1] "2018-07-17 11:55:16 [run1/PID3/sc3/work0]: aggregating 'dailyCoolDays'"
[1] "2018-07-17 11:55:16 [run1/PID3/sc3/work0]: aggregating 'dailyPrecipitationEventSizeDistribution'"
[1] "2018-07-17 11:55:16 [run1/PID3/sc3/work0]: aggregating 'yearlyPET'"
[1] "2018-07-17 11:55:16 [run1/PID3/sc3/work0]: aggregating 'monthlySeasonalityIndices'"
[1] "2018-07-17 11:55:16 [run1/PID3/sc3/work0]: aggregating 'yearlymonthlyTemperateDrylandIndices'"
[1] "2018-07-17 11:55:16 [run1/PID3/sc3/work0]: aggregating 'yearlyDryWetPeriods'"
[1] "2018-07-17 11:55:16 [run1/PID3/sc3/work0]: aggregating 'dailyWeatherGeneratorCharacteristics'"

CaitlinA commented 6 years ago

Yes, this is by design. The 'runX' represents a site, so lat/long with soils and veg data. The sc is 'scenario' and represents the climate data associated with the site. Different scenarios represent different time periods (1970 - 2010 vs. 2020 - 2060) AND when the data is in the future, different models of predicted climate (RCP/GCMs).

For development purposes, you can toggle whether future scenarios are simulated or not. In the XXXX_descriptions.R file search for ' models = c('. You should see a list of RCP.GCM combos. Replace this with models = NULL. Make sure that this change is not committed to the repo. It would need to be turned back on to assure your branch passes all unit tests, etc.

You also don't NEED to work in the feature_Aggregation branch yet, do you? Why not develop on the master? Are you sure you set up the project correctly to get CONUS data? You could send me your descriptions.R file or send me a snapshot of the error.

On Tue, Jul 17, 2018 at 11:56 AM, nip5 notifications@github.com wrote:

Through turning verbose mode on, I can see that the simulations are slow because they are ran many times; it is recognized in the code that each run takes up to 60 seconds. For example, the prj4Test bundled in feature_DecadelAggregations runs the simulation 12 times, I think this is by design, and thus takes up to 12 * 60 seconds to complete. The obvious way to confirm this would be to run CONUS on the test and see how long it takes but CONUS doesn't seem to work on feature_DecadelAggregations branch. Could you check if you have time if CONUS works for you with that test project on feature_DecadelAggregations? We should see an output near the end of the run similar to this (with verbose on in the settings):

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/DrylandEcology/rSFSW2/issues/305#issuecomment-405690404, or mute the thread https://github.com/notifications/unsubscribe-auth/AM0e_wf07UVmZDulGsWUZo-FUXpSzeIiks5uHjNMgaJpZM4TkHRR .

-- Caitlin Andrews Ecologist Southwest Biological Science Center U.S. Geological Survey Mobile: 802-922-3494 Office: 928-556-7036

nip5 commented 6 years ago

Yes, I merged feature_DecadelAggregation into my branch so that I could test the cell extracting integration since I believe you said I needed to be on the decadel branch for it. You also mentioned that the decadel branch will be merged into master soon so I figured I should get ahead of that. The problem with CONUS strangely enough is that it doesn't get matricd properly, so it is able to extract the data but breaks when the data goes through validation. There are changes to CONUS that are made in decadel from master but I don't think they should cause it to fail.

dschlaep commented 6 years ago

See https://github.com/DrylandEcology/rSFSW2/issues/3: CONUS soil/STATSGO contains unrealistic soil density values. These are caught in the data checking step. Simply turn that one off during development, i.e., actions[["check_inputs"]] <- FALSE!

nip5 commented 6 years ago

When running CONUS with prj4Test I get:

"[run1/work0]: data missing for 1st layer -> no data to impute: simulation will fail"
         depth_cm           matricd    gravel_content EvapBareSoil_frac  transpGrass_frac  transpShrub_frac   transpTree_frac   transpForb_frac              sand              clay 
        5.0000000                NA         0.4900000         0.9078000         0.2163371         0.1769007         0.2250000         0.1976278         0.6200000         0.1100000 
           imperm 
        0.0000000

So it just doesn't extract matricd for the 1st layer, I have checked and it does extract it for some other layers though, I imagine this only occurs for me since the decadel branch is passing all tests on the CI.

dschlaep commented 6 years ago

TestPrj4/ does not extract any data. I don't understand why you need the code to extract soil density data from CONUS for your developing code to extract data from GlobalSoilMap_v05.

nip5 commented 6 years ago

Yes I was just trying to run CONUS as a reference for speed, I don't actually "need" to do it.

dschlaep commented 6 years ago

Then you only want to get the time stamp information from the verbose output of function prepare_ExtractData_Soils. No need to actually run the simulations.

nip5 commented 6 years ago

TestPrj4 generates files dbOutput.sqlite3 and dbOutput_TestPrj_v2.7.4.sqlite3 is included before running the project, the design of these two sql files need to match. Which one is the standard for which to match to, dbOutput_TestPrj_v2.7.4.sqlite3? My branch currently fails a test because they do not match and I need to know which one I should be modifying.

dschlaep commented 6 years ago

The (integration) tests formulated in test_projects.R that run the simulation experiment in TestPrj4/ do not do any extractions of external data (because these tests must be self-contained). Also, if you change any meaningful inputs, then the outputs will -- of course -- differ.

As you can see, we currently do not have any unit tests for the function prepare_ExtractData_Soils or for any of the specific soil dataset extraction functions such as extract_soil_CONUSSOIL for soil data from STATSGO or, similarly, as extract_soil_ISRICWISE for soil data from ISRIC-WISE.

That means that you will write a new function, e.g., extract_soil_GlobalSoilMap(..., dataset = "v05", ...), for which you cannot use any existing tests.

@CaitlinA Can you please talk to @nip5 -- it seems that he doesn't get something fundamental here -- I have no clue why he is so fixed on getting tests working that cannot work; he is supposed to use a copy of TestPrj4/ as an example simulation project to work and meddle with and not as a test case.

nip5 commented 6 years ago

Yes I know the tests don't do any extraction. Are you saying that I shouldn't be concerned with getting tests to pass but instead I should be modifying a copy of TestPrj4 to work with my function? If that's the case then it is much different than my understanding, which is to make my function pass all current tests.

nip5 commented 6 years ago

I'm talking about getting it to pass all the unit tests, since we don't have any soil extraction tests, the function itself obviously can't be tested directly but the unit tests (and projects) expect a certain outcome after extracting soil data so I think I still need to pass those unit tests.

dschlaep commented 6 years ago

Why do you want the existing tests to pass if you know that they don't test any extraction? Your new function will not be tested by the current tests because your function is about extracting data and no current test tests extracting data.

If you change current tests, e.g., by changing the inputs for tests that use TestPrj4/, then of course those tests will fail, but this is completely meaningless for your new function because they were designed to test different expectations.

If you want to test your new function (which would be ideal), then write a new dedicated unit test(s) for it -- conditioned on the availability of the external data set (i.e., it will be run at most locally, but never on CIs) -- and, in any case, don't change the current set of unit tests. For instance, add a new file tests/testthat/test_ExtractData_Soils.R with

context("Soil data extraction")
...

test_that("Extract soil data from GlobalSoilMap", {
  skip_if_not(dir.exists(`PATH TO GlobalSoilMap version`))
  ...
})

So, yes, indeed, "use a copy of TestPrj4/ as an example simulation project to work and meddle with and not as a test case". Since this is so different from your understanding, I asked @CaitlinA to talk with you.

dschlaep commented 6 years ago

but the unit tests (and projects) expect a certain outcome after extracting soil data so I think I still need to pass those unit tests.

--> The (integration) tests that use TestPrj4/ expect that the given inputs produce the given reference output. If you change the inputs, then you get different outputs and the tests fail. Thus, your changing the inputs of TestPrj4/ and hoping that running devtools::test() will succeed is useless.

nip5 commented 6 years ago

I'm saying that the inputs should be the same between CONUS and my function because they're extracting the same information just from different sources. So I'm not trying to change the inputs to TestPrj4/ and expect the same output (that would be insanity), I'm trying to give it the same input for the same output. It makes sense to me that only the specific soil data numbers should be different.

dschlaep commented 6 years ago

We want to add the ability to extract soil data from GlobalSoilMap exactly because it results in different (and better) soil data values than those extracted from CONUSoil. So, your new function is supposed to produce different soil values; and if you run these on the existing tests then they are supposed to fail because the inputs are different.

nip5 commented 6 years ago

When its all said and done I just don't want to try and merge in something that doesn't pass everything is all, and if TestPrj4/ doesn't pass locally, it's not going to pass on the CI. Oh yes my function does give different layer soil values but I didn't think that would effect the tests. So you're saying that the difference between Sand_L1 = .22 and Sand_L1 = .34 could cause the tests to fail? I didn't know they were testing for what the values actually were.

nip5 commented 6 years ago

I see where in dbOutput_TestPrj4_v2.7.4.sqlite3 it expects certain values now. So this test is specifically tailored to another extraction method so I don't need to worry about it correct? This test won't be there in master when the time comes to merge my branch into it? If it is then my method would obviously still cause it to fail, that's why I've been hung up on trying to get it to pass.

DrylandEcology / rSFSW2

External soil data: Gridded global data v5 #305