GeoNode / geonode

GeoNode is an open source platform that facilitates the creation, sharing, and collaborative use of geospatial data.
https://geonode.org/
Other
1.45k stars 1.12k forks source link

Data summary table #182

Closed jj0hns0n closed 11 years ago

jj0hns0n commented 14 years ago

Currently, on the layer information page there is an entry for the data's Attributes--the names of the columns of the data.

Better would be a table which shows statistical summary information of the data (average, max, min, ... what else?)

sbenthall commented 14 years ago

Assigned to Rollie for design. This is brought up on the most recent design PDF ( http://projects.opengeo.org/CAPRA/attachment/wiki/Design/GeoNode_20100630.pdf ) but I don't know how to mark up a table so that its (a) consistent with the current GeoNode theme, and (b) doesn't just seem like an arbitrary additional element of the already busy info page.

ingenieroariel commented 14 years ago

That idea is cribbed from GeoCommons, which offers a table of attributes as well as range, median, mean, and standard deviation. I suspect that's the set of summary statistics that makes sense, but I have no specific knowledge of what else would be useful there.

Example: http://finder.geocommons.com/overlays/5782

ingenieroariel commented 14 years ago

Note that not all column types are describable by statistics (geometry, category (represented as integer or string values), street names...) so this sort of table may not be that useful (you could install the CSS community module in the live site and check its attribute table to see what I'm talking about, since that does provide the kind of statistics you're talking about).

Actually calculating the statistics is nontrivial too since they may require a full scan of a large database table.

ingenieroariel commented 14 years ago

There should still be an area for the user to enter information about the attributes. Just give the data fields can be problematic as they are frequently not descriptive as to the actual use of the attribute column. Giving a space for the user to describe each column in the attribute table allows for increase usability of datasets.

ingenieroariel commented 14 years ago

Also. What would be best for displaying data table would be a GeoEXT inset (maybe expandable to show it) that has the full rows and columns for the data. This is what would be most useful from a GIS analyst perspective in order to be able to inspect the data. However, this would require its own level of permissions as that is exposing the actual data values themselves via the data summary page.

ingenieroariel commented 13 years ago

David, as you can see from the the GeoCommons example above, they just colspan the invalid rows and throw in a "Text column (no statistics available)" when statistics don't make sense—I figured we'd do the same. As to the technical hurdles of actually calculating the statistics... I'm afraid I don't have any ideas there.

Galen, I agree that the attributes are often useless meaningless without metadata and I've often noticed descriptions included in the abstract or suplemental info (and we should encourage that). Maybe it'd make sense to allow titles on attribute names, we should definitely consider it further down the line. As to displaying the full-attribute table, that's probably not entirely feasible through the web for datasets of any substantive size and requiring a whole other set of permissions to manage is probably overcomplicating the problem, no?

ingenieroariel commented 13 years ago

Here is a proposal for how the data summary table could work: https://img.skitch.com/20101215-e4ruabwsp76hn1k1t119d2nhnb.jpg

Marker 1.0 indicates the default list view of attributes with a "View statistics" link floated off to the right. Clicking the link would swap the list for the table shown in Marker 2.0, which features range, median, mean, and standard deviation for all rows for which that would be meaningful and a "Hide statistics" link at the bottom that would revert to the list from Marker 1.0.

Galen, the table at the bottom denoted by Marker 3.0 is an idea for how we could display attribute descriptions inline. What are your thoughts? If you like it we'd have to figure out some way of adding or editing them—likely an "Edit attribute descriptions" link at the bottom alongside the "Hide statistics" link. Are there appropriate fields defined by ISO 19139 (or whatever) in which to dump attribute descriptions? If so, we should take advantage of them.

For reference, the HTML for generating the example table at Marker 3.0 is below:

{{{

Attributes Range Median Mean Standard Deviation
the_geom Text column (no statistics available)
AREA 0 – 13382528 1819.00 59889.11 416416.27

The lines of equal hazard, which are the lines between the polygons, were determined by interpolating from a grid of equally spaced points in latitude and longitude. Each point was weighted based on the seismic hazard at that location. The grid spacing is 0.1 degrees for Alaska and the conterminous United States, and 0.02 degrees for Hawaii.

PERIMETER 0 – 9529 4.00 62.35 333.71
AREA -3344 – 1639406 300.00 9179.30 59407.53

This map layer shows peak horizontal ground acceleration (the fastest measured change in speed, for a particle at ground level that is moving horizontally due to an earthquake) with a 10% probability of exceedance in 50 years. Values are given in %g, where g is acceleration due to gravity, or 9.8 meters/second^2.

SEIHAZM020 Text column (no statistics available)
VALLEY Text column (no statistics available)

}}}

sbenthall commented 13 years ago

I think Matt Bertrand may have implemented something like this already for WorldMap

ingenieroariel commented 12 years ago

A lot of the comments that appear to be mine above this one are not (this is my first message to this thread)

This ticket is in scope for 2.x and already part of the design mockups. Leaving it open and with the 2.x milestone.

Nathan-Wang commented 11 years ago

The new released Geoserver 2.2.x includes a WPS extension which can be used for this statistical calculation. This new Geoserver is not currently connected to GeoNode but it is easy to do. With the WPS functions enabled from the Geoserver side, clients can inquiry a range of statistics about attributes from layers saved on the Geoserver, for example, max, min, sum, mean, and median. The most convenient way to do this inquiry is through a XML file, for example, https://gist.github.com/4251907

Below I explain the structure of the input file and the client side code to make this kind of inquiry.

In the inquiry xml, a WPS identifier must be specified first. All the basic statistics are located in 'gs:Aggregate' so gs:Aggregated is called first. After that, the layer name needs to be indicated, and in our example, ''medford:wetlands". The next important thing is to choose the attribute that stats are derived from. So far, only one attribute can be chosen at one time. For multiple attributes, you have to repeat this process multiple times. It is not great, but that is the situation now. In our example, I choose 'area_1' which saves the area size of each polygon. The last thing must be specified is the statistic function. Here I use 'Average' but you have the option to choose max, min, sum and etc. After this part finishes, just finish up the xml file and your job is done.

In addition to the xml, we also need some code at the client side to facilitate this communication. Here we rely on owslib to do this job. After installing the owslib library, we just need a few lines of code like below to finish this task.

from owslib.wps import WebProcessingService

wps = WebProcessingService('http://localhost:8000/geoserver/wps', verbose=True, skip_caps=True) wps.getcapabilities()

request = open('Aggregate.xml','r').read() execution = wps.execute(None, [], request=request)

Here we use the inquiry xml mentioned above, and the server will response in a number which is the average area of the wetlands.

As such, a wps of a statistical value is achieved. In GeoNode the page of layer detail, we need to repeat this process for all suitable attributes on all stats. A decent amount of work.

Two things can be improved from current service. First, the response from Geoserver is not in compatible format to owslib so owslib cannot immediately get the number. Second, instead of inquiry of one stat of one attribute at one time, we need to have multiple stats on multiple attribute done at one time, which will greatly improve the speed. It should be too difficult to have these two things implemented.

@jj0hns0n and @ingenieroariel, thanks for comments and/or follow-up report on any progress about this function.

jj0hns0n commented 11 years ago

Thanks nathan!

On Dec 10, 2012, at 10:15, Nathan-Wang notifications@github.com wrote:

The new released Geoserver 2.2.x includes a WPS extension which can be used for this statistical calculation. This new Geoserver is not currently connected to GeoNode but it is easy to do. With the WPS functions enabled from the Geoserver side, clients can inquiry a range of statistics about attributes from layers saved on the Geoserver, for example, max, min, sum, mean, and median. The most convenient way to do this inquiry is through a XML file, for example, https://gist.github.com/4251907

Below I explain the structure of the input file and the client side code to make this kind of inquiry.

In the inquiry xml, a WPS identifier must be specified first. All the basic statistics are located in 'gs:Aggregate' so gs:Aggregated is called first. After that, the layer name needs to be indicated, and in our example, ''medford:wetlands". The next important thing is to choose the attribute that stats are derived from. So far, only one attribute can be chosen at one time. For multiple attributes, you have to repeat this process multiple times. It is not great, but that is the situation now. In our example, I choose 'area_1' which saves the area size of each polygon. The last thing must be specified is the statistic function. Here I use 'Average' but you have the option to choose max, min, sum and etc. After this part finishes, just finish up the xml file and your job is done.

In addition to the xml, we also need some code at the client side to facilitate this communication. Here we rely on owslib to do this job. After installing the owslib library, we just need a few lines of code like below to finish this task.

from owslib.wps import WebProcessingService

wps = WebProcessingService('http://localhost:8000/geoserver/wps', verbose=True, skip_caps=True) wps.getcapabilities()

request = open('Aggregate.xml','r').read() execution = wps.execute(None, [], request=request)

Here we use the inquiry xml mentioned above, and the server will response in a number which is the average area of the wetlands.

As such, a wps of a statistical value is achieved. In GeoNode the page of layer detail, we need to repeat this process for all suitable attributes on all stats. A decent amount of work.

Two things can be improved from current service. First, the response from Geoserver is not in compatible format to owslib so owslib cannot immediately get the number. Second, instead of inquiry of one stat of one attribute at one time, we need to have multiple stats on multiple attribute done at one time, which will greatly improve the speed. It should be too difficult to have these two things implemented.

@jj0hns0n https://github.com/jj0hns0n and @ingenieroarielhttps://github.com/ingenieroariel, thanks for comments and/or follow-up report on any progress about this function.

— Reply to this email directly or view it on GitHubhttps://github.com/GeoNode/geonode/issues/182#issuecomment-11209990.