Module: Summarize rasters by polygons

mateuszwyszynski commented 7 years ago

Result of this issue should be a functionality which allows:

users to upload their own sets of polygons
summarize a raster by this set of polygons

mateuszwyszynski commented 7 years ago

We currently see two modules in this issue:

For uploading set of polygons to summarize by.
For summarizing a raster by known set of polygons, i.e. input: raster, set of polygons, summary function. Output: List consisting results of applying summary function to each polygon in raster.

@eliotmcintire @achubaty - do you agree with this approach?

Could you also confirm that this

namely the cells which are within the area on the picture, is a single polygon?

Also, what part of this functionality is currently available in the current demo app (not as a module per se, but as an example of usage of this module):

When I click on a polygon (as above) I can see a popup with some information
There is a histogram next to the map of Time since fire
There are histograms of so called Large patches

Are these all functionalities we want to get or is there anything missing?

achubaty commented 7 years ago

I think the two-module approach makes sense.

I don't believe the current apps we shared have this implemented, but we have sample code elsewhere we can share (but it may be a bit messy).

eliotmcintire commented 7 years ago

Yes, that picture is a single polygon.

The new mapedit package gets some of the points you list: https://goo.gl/fzPSZ4

We are ok with using sf package (used by mapedit). We are just starting to use it. It is MUCH faster than sp, and converting objects back and forth using as(obj, "sf") or as(obj, "sp") is fast-ish. So, use mapedit if it helps.

When I click on a polygon (as above) I can see a popup with some information YES
There is a histogram next to the map of Time since fire That is an example, yes. The module would be more generic ... any polygon ... any raster
There are histograms of so called Large patches This is a bit of a complicated one... I am not sure exactly the generic approach to this. The steps to build these histograms (as defined in this function:

largePatchesFn <- function(timeSinceFireFiles, vegTypeMapFiles, polygonToSummarizeBy, cl, ageCutoffs = ageClassCutOffs, countNumPatches = countNumPatches, ageClasses, notOlderThan = Sys.time() - 1e7)

Take 2 rasters (actually filenames to raster files: timeSinceFireFiles & vegTypeMapFiles) and 1 polygon
- extract one age slice from timeSinceFire (e.g., 0-40 years) - ras1
- extract one veg type from vegTypeMapeFiles (e.g., Pine) - ras2
- Make a single raster from these two (i.e,. ras1 & ras2) --> singleTypeAgeRas
- run raster::clump(singleTypeAgeRas) to identify contiguous patches within each polygon --> patchSizes
- repeat for each age slice, veg type, and time slice
- This creates a data.table with 5 dimensions (polygonID, PatchSize, Year, age slice, veg type)

The generic visualizing should be able to handle 5 dimensions. In the example shown in the app there is actually a hidden 6th dimension -- Number of patches ... the Patch Size is converted Number of patches per year) to dimensions are shown:

Left Menu --> Define the patch size limit, allows us to count number of patches above this size. Left Menu --> Age slice (Young, Mature etc) Tabs - PolygonID Plot device -- Veg type Histogram -- Year * Number of Patches (they are confounded here... i.e., histograms are counts of number of patches larger than X in years 500, 510, 520 etc.

So, we can work with: left menu, tabs, several plotting devices, plots. But, these need to be able to be switched around ... e.g., In principle, Patch size could be tabs, etc.

Also, a user might want Time to be a slider

Yes, these are the functionalities. But, generically.

mateuszwyszynski commented 7 years ago

@achubaty @eliotmcintire Does "allow user to change the polygon layer used for mapping / calculation" mean that user can change the polygons reactively when the app is running or do you mean that user can select the polygons to use when bootstrapping the app?

eliotmcintire commented 7 years ago

Should be changeable reactively

mateuszwyszynski commented 7 years ago

@achubaty @eliotmcintire,

Our current understanding of how module with large patches function works is the following: A. At first, the user gets to choose size of patches which should be considered as large. This is then passed to the large patches function. B. The result of the large patches function is a data table and this is exactly the information we further work with. C. This information is then subsequently subsetted by user choices or by module calls:

First by choosing age (user)
Then by the choice of polygon (user)
In the end by the vegetation type (plot function)

The abstraction is that there is n-dimensional data and we would like to have an ability to slice that data in a nested manner (i.e. having n dimensional data make one dimension constant and pass n-1 dimensional data for selecting next dimension until we reach m dimensional data that we'd like to visualize).

We can slice data using tabs / slider / text input / menu items / dropdown and others. We can display n dimensional data on a grid of n-1 dimensional visualizations or just display n-dimensional data (histogram / plot / table / etc.).

Given our approach of generating apps (which is - one menu item = one module) we can't use menu items for slicing data (which is the way it is implemented in LandWeb currently). We'll have to use second set of tabs to achieve the same effect. This means that:

the menu subitems corresponding to “age” choice have to become tabs within menu item “Large Patches” (like in Overview Diagrams tab)
choice of patch size also has to be made outside of dashboard menu

There are 5 generic UI modules that we can use:

Tabs
Grid
Slider (implemented)
Object visualization module (currently we have only histogram implemented, but it'll be generalized)
Text input (doesn't need a separate module as it's just an input)

Examples of such behaviour in large patches tab would be: 1. “Polygon” tabs. A data table with known age of trees (e.g. “Young”) is received by tab module. Data table is divided into subtables based on polygon IDs. For each subtable, a tab is created. Subtables are passed to a grid module enclosed in each tab. 2. “Vegetation type”. User chooses polygon. A data table with known age and polygon id is received by a grid module. Data table is divided into subtables based on vegetation type. For each subtable, a grid window is created. Subtables are passed to histogram module enclosed in each grid window.

We'd like to implement one generic module that slices the data. UI method expects just id for namespace as it will just create the placeholder. Server method expects:

id for namespace
a data table,
a data.frame that describes by which dimesion to slice and which method to use.
a ui function to use for displaying the final data
a function to use in server for the final data

In order to build Large Patches module you'd have write following code:

# UI:
# data is a reactive value that changes when patch size is altered in text input
slicerUI(ns("largePatches"))

# Server:
callModule(
  slicer,
  "largePatches",
  data,
  data.frame(
    by = c("age", "polygonId", "vegetationType")
    type = c("tab", "tab", "grid")
  ),
  function(id, category, data) {
    histogramUI(id, title = category)
  },
  function(id, category, data) {
    callModule(histogram, id, data, {some other histogram parameters chosen by user or based on data})
  }
)

Now if you'd like to change the way you slice data, you just change "tab" to "slider" where necessary and the rendered app will just use that perspective to look through the data.

The higher level abstraction is that there is an n-dimensional data available but we are able to see only m-dimensional data, where n > m. In order to do so we have to obtain 'glasses' to look at this n-dimensional data and see the m-dimensional visualisation. These 'glasses' have to help us get through n-m dimensions. They consist of n-m filters and it is up to user to select which filters to choose from. These filters could be just tabs, they could be just sliders or just inputs or any mix on any level. It is also up to user to choose which method of visualisation to choose.

Such approach gives us great flexibility as we can keep adding different methods of slicing the dimensions and it all will just work once you implement the way data is sliced in this one method. By taking advantage of this pattern, we can easily change histogram module to some other plot module (or, in general, summary module). There is no need to specify dimensions of a data table. This is determined only by the needs of plot (summary) module.

What do you think about this approach? Do you have any additional insights?

Best

mateuszwyszynski commented 7 years ago

@achubaty @eliotmcintire

Could you also explain what is the meaning and purpose of maxNumClusters value inside ClumpMod module from LandWeb app?

achubaty commented 7 years ago

... [snip] ...

What do you think about this approach? Do you have any additional insights?

I like it -- it's flexible enough yet simple to specify how the data are sliced.

mateuszwyszynski commented 7 years ago

@achubaty @eliotmcintire

In order to create the current version of the LandWeb app, we would like to now work on including histogram module at the end of slicer module. To do this, we need to better understand how the histogram works.

Right now histogram module receives entire data table created by large patches function. This data table is then subsetted and further modified inside histogram module in order to create desired histogram plot.

Since we have together decided to follow the approach described by the slicer module (#57) this unavoidably has to change. Could you please explain in more detail what data and in what form should be used by histogram module?

Subsetting will be done by slicer module. From what I see at least “age” and “polygonID” dimensions should have fixed value. What about “vegetation type”? If we use slicer module, this dimension/category should also be fixed, e.g. histogram module receives only data table consisting information about “Pine”. Is it possible to create histograms from large patches tab based on subtable for fixed “age”, “polygonID” and “vegetation type”?

What should be then done with the data received by histogram module?

Best, Mateusz

eliotmcintire commented 7 years ago

First, your proposal seems exactly like what we were imagining. YES! Great flexibility, switching between dimenions... bingo! YES. My only caveat is that, as you say, the entire data table is passed into the module right now, and internally, the module pulls out the elements it needs, based on the id being passed into the module. See below for more details.

To answer questions:

... snip ... Could you also explain what is the meaning and purpose of maxNumClusters value inside ClumpMod module from LandWeb app?

maxNumClusters is used to attempt to keep the number of bars in each histogram consistent across the several histograms visible at any one moment. Because R will automatically identify the number of bars independently for every histogram, we wanted to override this behaviour and always show at least 6 bars, but not more than some number of histogram bars, X, which is related to the data. The exact algorithm is in front of me, but I don't recall my justification. It was "good enough".

... [snip] ... Could you please explain in more detail what data and in what form should be used by histogram module? This is based on the id being passed into the module. The id was something like 2_32, which was being parsed and they used to extract only the elements from the data.table that were to be used within that single histogram. So, determining which elements to extract is happening at the higher level in the server.R, so id is composed of 3 components, ageClasspolygon_Vegetation type, which are then parsed inside the clumpMod histogram module
lapply(seq_along(ageClasses), function(ageClassIndex) { # ageClassIndex is age
lapply(polygonsWithData[ageClass==ageClasses[ageClassIndex]]$V1, function(j) { # j is polygon index
lapply(seq_along(vegLeadingTypesWithAllSpecies), function(k) { # k is Veg type
callModule(clumpMod, paste0(ageClassIndex, "_", j, "_", k, "_clumps"),
Clumps = reactive({ClumpsReturn()$Clumps}),
id = paste0(ageClassIndex, "_", j, "_", k, "_clumps"),
ageClasses = ageClasses,
vegLeadingTypes = vegLeadingTypesWithAllSpecies,
numReps = lenTSF
)  
})
})
})
So, if the histogram module (or the generic version) should just accept the subsetted data.table, then the subsetting needs to happen outside of the module. This should be fine, no?

When you say that "age" and "polygonID" are fixed, I think the point is that the higher level module will slice the whole data.table into many sub data.tables, and each individual subdata.table will be passed into the histogram module. So, all m dimensions are "fixed" from the perspective of the histogram module. But from the perspective of the slicer module, it is one big table, with lots of potential ways to slice it up.

Does that answer your questions?

PredictiveEcology / SpaDES.shiny

Module: Summarize rasters by polygons #15