aazuspan / sankee

Visualize classified time series data with interactive Sankey plots in Google Earth Engine
https://sankee.readthedocs.io/en/latest/index.html
MIT License
97 stars 15 forks source link

Contingency tables #31

Closed martyclark closed 1 year ago

martyclark commented 1 year ago

Really loving how easy to use Sankee is - great work sir!

I was previously hacking around producing a frequencyHistogram() of pixel counts per landuse class to describe the changes between two images. Then using the R package OpenLand to produce a sankee diagram from the resulting featurecollection. Needless to say this was many more lines of code than I now need with sankee!

If I had to request a feature though it would be to be able to output a contingency table (data.frame) of the actual aggregate values i.e. either sum of pixel counts and/or areas (km2) of the transitions from one land use class to another between images. Would that be an easy addition?

aazuspan commented 1 year ago

Thanks for the suggestion, @martyclark! That's not available in sankee, but I think it could be a useful feature. I'll do some experimenting to see how best to implement it and keep you posted.

In the meantime, there are a few (currently undocumented) attributes that you could use to calculate that table. When you run sankify, it returns a SankeyPlot object. You can access a dataframe of the point values that were sampled through the data attribute. For example, this code...

plot = sankee.datasets.MODIS_LC_TYPE1.sankify(
    years=[2001, 2019],
    region=ee.Geometry.Point([-134.774492, 57.240839]).buffer(1000)
)

plot.data

will return this table...

2001 2019
0 17 17
1 17 17
2 17 17
3 17 17
4 17 17
... ... ...

where each row shows the class ID (in this case 17) for a sample point at each time step. It shouldn't be too much of a lift to turn that into something like what you're describing.

For reference, you can also access the sampled ee.FeatureCollection directly with plot.samples, in case that's handy.

aazuspan commented 1 year ago

You can access the total samples and proportional changes between classes using SankeyPlot.df. That's used internally to build the plots so there are some other columns that aren't relevant, but you should be able to get everything you need.

Here's a quick example:

plot = sankee.datasets.LCMS_LC.sankify(
    years=[1985, 2019],
    region=ee.Geometry.Point([-115.33931274414063, 36.17475203935]).buffer(1000)
)

# Grab the dataframe from the plot
df = plot.df.copy()

# Get stats for areas that started as shrubs
df[df.source_label.eq("Shrubs")]

Will give you:

source_year target_year source target changed total proportion source_label target_label
0 1985 2019 7 4 1 155 0.00645161 Shrubs Grass/Forb/Herb & Trees Mix
1 1985 2019 7 10 3 155 0.0193548 Shrubs Grass/Forb/Herb
2 1985 2019 7 12 151 155 0.974194 Shrubs Barren or Impervious

Based on the sample, 97.4% of shrubs became barren or impervious between 1985 and 2019. To get more accurate estimates, you can increase the number of samples when you call sankify using the argument n.