Contingency tables - Githubissues

martyclark commented 2 years ago

Really loving how easy to use Sankee is - great work sir!

I was previously hacking around producing a frequencyHistogram() of pixel counts per landuse class to describe the changes between two images. Then using the R package OpenLand to produce a sankee diagram from the resulting featurecollection. Needless to say this was many more lines of code than I now need with sankee!

If I had to request a feature though it would be to be able to output a contingency table (data.frame) of the actual aggregate values i.e. either sum of pixel counts and/or areas (km2) of the transitions from one land use class to another between images. Would that be an easy addition?

aazuspan commented 2 years ago

Thanks for the suggestion, @martyclark! That's not available in sankee, but I think it could be a useful feature. I'll do some experimenting to see how best to implement it and keep you posted.

In the meantime, there are a few (currently undocumented) attributes that you could use to calculate that table. When you run sankify, it returns a SankeyPlot object. You can access a dataframe of the point values that were sampled through the data attribute. For example, this code...

plot = sankee.datasets.MODIS_LC_TYPE1.sankify(
    years=[2001, 2019],
    region=ee.Geometry.Point([-134.774492, 57.240839]).buffer(1000)
)

plot.data

will return this table...

	2001	2019
0	17	17
1	17	17
2	17	17
3	17	17
4	17	17
...	...	...

where each row shows the class ID (in this case 17) for a sample point at each time step. It shouldn't be too much of a lift to turn that into something like what you're describing.

For reference, you can also access the sampled ee.FeatureCollection directly with plot.samples, in case that's handy.

aazuspan commented 1 year ago

You can access the total samples and proportional changes between classes using SankeyPlot.df. That's used internally to build the plots so there are some other columns that aren't relevant, but you should be able to get everything you need.

Here's a quick example:

plot = sankee.datasets.LCMS_LC.sankify(
    years=[1985, 2019],
    region=ee.Geometry.Point([-115.33931274414063, 36.17475203935]).buffer(1000)
)

# Grab the dataframe from the plot
df = plot.df.copy()

# Get stats for areas that started as shrubs
df[df.source_label.eq("Shrubs")]

Will give you:

	source_year	target_year	source	target	changed	total	proportion	source_label	target_label
0	1985	2019	7	4	1	155	0.00645161	Shrubs	Grass/Forb/Herb & Trees Mix
1	1985	2019	7	10	3	155	0.0193548	Shrubs	Grass/Forb/Herb
2	1985	2019	7	12	151	155	0.974194	Shrubs	Barren or Impervious

Based on the sample, 97.4% of shrubs became barren or impervious between 1985 and 2019. To get more accurate estimates, you can increase the number of samples when you call sankify using the argument n.

aazuspan / sankee

Contingency tables #31