Open CaitlinA opened 6 years ago
Data available here and should be downloaded to harddrive in separate and aptly named folder (i.e. BigData/Soils/GlobalSoilMap_v05)
Implementation should be similar to function/workflow "do_ExtractSoilDataFromCONUSSOILFromSTATSGO_USA"
-Look at SOILWAT2/testing/Input/soils.in -> Need to grab bulk density (<2mm), sand, clay, gravel
The collection here: https://scholarsphere.psu.edu/collections/jw827b80n, which is our most current attempt does not inlcude gravel, is that fine?
Also, the depths provided using that collection differ from those in the example 'soil.in' file, is this fine as well?
Yes. It is not unusual for each soils dataset to have unique depths.
On Mon, Jun 18, 2018 at 11:25 AM, nip5 notifications@github.com wrote:
Also, the depths provided using that collection differ from those in the example 'soil.in' file, is this fine as well?
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/DrylandEcology/rSFSW2/issues/305#issuecomment-398149978, or mute the thread https://github.com/notifications/unsubscribe-auth/AM0e_9pzzhArwVb6x7_Ox8b33-w7iAcvks5t9_CqgaJpZM4TkHRR .
-- Caitlin Andrews Ecologist Southwest Biological Science Center U.S. Geological Survey Mobile: 802-922-3494 Office: 928-556-7036
Through turning verbose mode on, I can see that the simulations are slow because they are ran many times; it is recognized in the code that each run takes up to 60 seconds. For example, the prj4Test
bundled in feature_DecadelAggregations
runs the simulation 12 times, I think this is by design, and thus takes up to 12 * 60 seconds to complete. The obvious way to confirm this would be to run CONUS on the test and see how long it takes but CONUS
doesn't seem to work on feature_DecadelAggregations
branch. Could you check if you have time if CONUS
works for you with that test project on feature_DecadelAggregations
? We should see an output near the end of the run similar to this (with verbose on in the settings): @CaitlinA @dschlaep
[1] "2018-07-17 11:55:16 [run1/PID3/sc3/work0]: aggregating 'dailyColdDays'"
[1] "2018-07-17 11:55:16 [run1/PID3/sc3/work0]: aggregating 'dailyCoolDays'"
[1] "2018-07-17 11:55:16 [run1/PID3/sc3/work0]: aggregating 'dailyPrecipitationEventSizeDistribution'"
[1] "2018-07-17 11:55:16 [run1/PID3/sc3/work0]: aggregating 'yearlyPET'"
[1] "2018-07-17 11:55:16 [run1/PID3/sc3/work0]: aggregating 'monthlySeasonalityIndices'"
[1] "2018-07-17 11:55:16 [run1/PID3/sc3/work0]: aggregating 'yearlymonthlyTemperateDrylandIndices'"
[1] "2018-07-17 11:55:16 [run1/PID3/sc3/work0]: aggregating 'yearlyDryWetPeriods'"
[1] "2018-07-17 11:55:16 [run1/PID3/sc3/work0]: aggregating 'dailyWeatherGeneratorCharacteristics'"
Yes, this is by design. The 'runX' represents a site, so lat/long with soils and veg data. The sc is 'scenario' and represents the climate data associated with the site. Different scenarios represent different time periods (1970 - 2010 vs. 2020 - 2060) AND when the data is in the future, different models of predicted climate (RCP/GCMs).
For development purposes, you can toggle whether future scenarios are simulated or not. In the XXXX_descriptions.R file search for ' models = c('. You should see a list of RCP.GCM combos. Replace this with models = NULL. Make sure that this change is not committed to the repo. It would need to be turned back on to assure your branch passes all unit tests, etc.
You also don't NEED to work in the feature_Aggregation branch yet, do you? Why not develop on the master? Are you sure you set up the project correctly to get CONUS data? You could send me your descriptions.R file or send me a snapshot of the error.
On Tue, Jul 17, 2018 at 11:56 AM, nip5 notifications@github.com wrote:
Through turning verbose mode on, I can see that the simulations are slow because they are ran many times; it is recognized in the code that each run takes up to 60 seconds. For example, the prj4Test bundled in feature_DecadelAggregations runs the simulation 12 times, I think this is by design, and thus takes up to 12 * 60 seconds to complete. The obvious way to confirm this would be to run CONUS on the test and see how long it takes but CONUS doesn't seem to work on feature_DecadelAggregations branch. Could you check if you have time if CONUS works for you with that test project on feature_DecadelAggregations? We should see an output near the end of the run similar to this (with verbose on in the settings):
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/DrylandEcology/rSFSW2/issues/305#issuecomment-405690404, or mute the thread https://github.com/notifications/unsubscribe-auth/AM0e_wf07UVmZDulGsWUZo-FUXpSzeIiks5uHjNMgaJpZM4TkHRR .
-- Caitlin Andrews Ecologist Southwest Biological Science Center U.S. Geological Survey Mobile: 802-922-3494 Office: 928-556-7036
Yes, I merged feature_DecadelAggregation into my branch so that I could test the cell extracting integration since I believe you said I needed to be on the decadel branch for it. You also mentioned that the decadel branch will be merged into master soon so I figured I should get ahead of that. The problem with CONUS strangely enough is that it doesn't get matricd
properly, so it is able to extract the data but breaks when the data goes through validation. There are changes to CONUS that are made in decadel from master but I don't think they should cause it to fail.
See https://github.com/DrylandEcology/rSFSW2/issues/3: CONUS soil/STATSGO contains unrealistic soil density values. These are caught in the data checking step. Simply turn that one off during development, i.e., actions[["check_inputs"]] <- FALSE
!
When running CONUS with prj4Test
I get:
"[run1/work0]: data missing for 1st layer -> no data to impute: simulation will fail"
depth_cm matricd gravel_content EvapBareSoil_frac transpGrass_frac transpShrub_frac transpTree_frac transpForb_frac sand clay
5.0000000 NA 0.4900000 0.9078000 0.2163371 0.1769007 0.2250000 0.1976278 0.6200000 0.1100000
imperm
0.0000000
So it just doesn't extract matricd
for the 1st layer, I have checked and it does extract it for some other layers though, I imagine this only occurs for me since the decadel branch is passing all tests on the CI.
TestPrj4/
does not extract any data. I don't understand why you need the code to extract soil density data from CONUS for your developing code to extract data from GlobalSoilMap_v05
.
Yes I was just trying to run CONUS as a reference for speed, I don't actually "need" to do it.
Then you only want to get the time stamp information from the verbose output of function prepare_ExtractData_Soils
. No need to actually run the simulations.
TestPrj4 generates files dbOutput.sqlite3
and dbOutput_TestPrj_v2.7.4.sqlite3
is included before running the project, the design of these two sql files need to match. Which one is the standard for which to match to, dbOutput_TestPrj_v2.7.4.sqlite3
? My branch currently fails a test because they do not match and I need to know which one I should be modifying.
The (integration) tests formulated in test_projects.R
that run the simulation experiment in TestPrj4/
do not do any extractions of external data (because these tests must be self-contained). Also, if you change any meaningful inputs, then the outputs will -- of course -- differ.
As you can see, we currently do not have any unit tests for the function prepare_ExtractData_Soils
or for any of the specific soil dataset extraction functions such as extract_soil_CONUSSOIL
for soil data from STATSGO or, similarly, as extract_soil_ISRICWISE
for soil data from ISRIC-WISE.
That means that you will write a new function, e.g., extract_soil_GlobalSoilMap(..., dataset = "v05", ...)
, for which you cannot use any existing tests.
@CaitlinA Can you please talk to @nip5 -- it seems that he doesn't get something fundamental here -- I have no clue why he is so fixed on getting tests working that cannot work; he is supposed to use a copy of TestPrj4/
as an example simulation project to work and meddle with and not as a test case.
Yes I know the tests don't do any extraction. Are you saying that I shouldn't be concerned with getting tests to pass but instead I should be modifying a copy of TestPrj4
to work with my function? If that's the case then it is much different than my understanding, which is to make my function pass all current tests.
I'm talking about getting it to pass all the unit tests, since we don't have any soil extraction tests, the function itself obviously can't be tested directly but the unit tests (and projects) expect a certain outcome after extracting soil data so I think I still need to pass those unit tests.
Why do you want the existing tests to pass if you know that they don't test any extraction? Your new function will not be tested by the current tests because your function is about extracting data and no current test tests extracting data.
If you change current tests, e.g., by changing the inputs for tests that use TestPrj4/
, then of course those tests will fail, but this is completely meaningless for your new function because they were designed to test different expectations.
If you want to test your new function (which would be ideal), then write a new dedicated unit test(s) for it -- conditioned on the availability of the external data set (i.e., it will be run at most locally, but never on CIs) -- and, in any case, don't change the current set of unit tests. For instance, add a new file tests/testthat/test_ExtractData_Soils.R
with
context("Soil data extraction")
...
test_that("Extract soil data from GlobalSoilMap", {
skip_if_not(dir.exists(`PATH TO GlobalSoilMap version`))
...
})
So, yes, indeed, "use a copy of TestPrj4/ as an example simulation project to work and meddle with and not as a test case". Since this is so different from your understanding, I asked @CaitlinA to talk with you.
but the unit tests (and projects) expect a certain outcome after extracting soil data so I think I still need to pass those unit tests.
--> The (integration) tests that use TestPrj4/
expect that the given inputs produce the given reference output. If you change the inputs, then you get different outputs and the tests fail. Thus, your changing the inputs of TestPrj4/
and hoping that running devtools::test()
will succeed is useless.
I'm saying that the inputs should be the same between CONUS and my function because they're extracting the same information just from different sources. So I'm not trying to change the inputs to TestPrj4/
and expect the same output (that would be insanity), I'm trying to give it the same input for the same output. It makes sense to me that only the specific soil data numbers should be different.
We want to add the ability to extract soil data from GlobalSoilMap
exactly because it results in different (and better) soil data values than those extracted from CONUSoil
. So, your new function is supposed to produce different soil values; and if you run these on the existing tests then they are supposed to fail because the inputs are different.
When its all said and done I just don't want to try and merge in something that doesn't pass everything is all, and if TestPrj4/
doesn't pass locally, it's not going to pass on the CI. Oh yes my function does give different layer soil values but I didn't think that would effect the tests. So you're saying that the difference between Sand_L1 = .22 and Sand_L1 = .34
could cause the tests to fail? I didn't know they were testing for what the values actually were.
I see where in dbOutput_TestPrj4_v2.7.4.sqlite3
it expects certain values now. So this test is specifically tailored to another extraction method so I don't need to worry about it correct? This test won't be there in master
when the time comes to merge my branch into it? If it is then my method would obviously still cause it to fail, that's why I've been hung up on trying to get it to pass.
Convert current SSURGO implementation from geodatabase to this gridded datasource.
This will make both point and cell implementation much easier
@CaitlinA