Closed m-mohr closed 3 years ago
In the search for a good visual representation, here are some first ideas:
I like the way things are displayed in the R stars
package:
It also holds a good representation of what vector cubes are:
(as mentioned before, images taken from https://r-spatial.github.io/stars/)
I have another idea that, in my view, is able to explain the sort of data that is held in DataCubes and therefore can show that DataCubes are n-dimensional (here: time, 3 bands and x, y). A first sketch (ignoring the structure on the bottom right): Obviously this needs some improvement, e.g. could the raster be displayed as shown above, and the earth's surface be depictd in more detail.
Another possibility: Display as an actual cube, have z = time dimension and indicate different bands. However people might take the "Cube" too literally (as DataCubes can also contain 3, 5, etc. bands). With this it might be easier to graphically represent the cube operations.
Great sketches! That last one suggests that B2, B3, B4 and B8 are distributed over two dimensions, which is not very intuitive IMO, but showing that dimensions can be exchanged makes some sense. I put the R scripts that generated above figures at https://gist.github.com/edzer/5f1b0faa3e93073784e01d5a4bb60eca
Yeah, I think your first sketch works very well with some more details. Spatial are x and y, z is the bands and could be visualized with different colors (e.g. different shades of the color per pixel, one band red, one band green, one band blue) and then have each timestamp be part of your timeline.
Vector cubes in openEO are not really a thing at the moment so we could skip that part for now, but if you have good ideas, feel free to write them down anyway and we can have them in a separate markdown file for now.
Thank you for the feedback @m-mohr and also for the code @edzer, here's a first implementation of the idea:
Still missing a representation of the surface (also not 100% sure if needed).
Please let me know any feedback. Sketches and/or graphs on the processes will follow.
I like that a lot, well done! Could you change the pink color to yellow or so? I find it hard to distinguish from the red above... or change the order of the colors to not have red and pink directly after each other.
This is a figure representing temporal resampling. I decided to not represent the resampling process itself (calculation of new time steps). Let me know if you disagree. I have a question regarding the date "2020-09-28" in the upsampling process: I am guessing that the resulting datacube just doesn't contain an image for dates that lie before the first date of the original cube. Is that correct? Would it be appropriate to delete the entry for "2020-09-28" on the timeline for the "output" (but keeping it at the "resample" timeline to show the difference)?
Whether 2020-09-28 has data or not depends on the upsampling method you use. Would it make sense to just remove the empty timestamp as indeed it would likely not be in the resulting data cube (or at least would be there with no-data).
I think I'm fine with not giving more details on the resampling, but maybe it's easier to understand if you change the label "resample" to "resample to"?
All the images look the same, which may confuse some, but overall I like the image. 👍
it would likely not be in the resulting data cube (or at least would be there with no-data).
Yeah, this is the tricky part because I think if it's there with no data, then the current image is exactly right. But if this is dependent on the resampling function I will delete the point for the first date, it's more intuitive.
"resample" to "resample to"?
sure! good idea.
All the images look the same
Yes, I will change this. Downsampling method will then be "mean" if that's alright. EDIT: things won't look so different then I'm afraid. Ideas to change that?
2nd EDIT: input is actually already displaying different time steps. Is the difference too subtle at this scale?
like so
Yeah, I now see that there's a subtle difference, but you need to look very closely to figure it out. Not sure whether that is actually an issue though. I guess we can leave it as it is for now. Changes in times series are often pretty subtle...
Other than that, the image looks good to me, thanks! 👍
I have some questions about the spatial aggregation processes:
aggregate_spatial_binary
is doing. Instead of a list it only gets passed two values. Which two values and what's the advantage of that?Regarding the previous discussion
Changes in times series are often pretty subtle...
Indeed. I think that in most graphics these very subtle changes are ok (as you say, we can always change that later on). They also result from the fact that breaks are set automatically for each raster. In the case where this is important (apply graphics, looking at single pixel values), I manually set breaks (so far only for third graphic).
edit @m-mohr
Is this expected to change at some point?
Not sure. I think not in the next 6 months at least.
I would favor leaving this restriction out of the graphic if that's alright.
Yes, I think that is fine for me.
- Just our of curiosity, I don't really get what exactly
aggregate_spatial_binary
is doing.
It is basically the same, just the way it reduces the values is different.
Instead of a list it only gets passed two values. Which two values and what's the advantage of that?
binary uses a reducer (see e.g. the JS reduce operation) which works on two values, which allows reducing of very large lists that would otherwise exceed the memory. The list variant (i.e. non-binary) works on a list directly. So it's mostly a thing to optimize the operation for very large data.
@jonathom In this thread https://github.com/Open-EO/openeo-processes/pull/215#discussion_r551218809 we discussed that we should add some guidance that data cubes (child) processes should be careful with data type changes. Like if it gets an array of numbers in a reducer, should also return a number and not e.g. a string or array. Could you add that somewhere in the general data cube descriptions, please? cc for review: @soxofaan
@m-mohr I'm not entirely sure if I understand what's going on, so let's discuss in next meeting. First thought: Maybe this is something for the cookbook (#16), since it is much more "how to do" instead of "how does it work"? Also, the cookbook could then just have a whole first section dedicated to "how to work with datacubes" to be a further reference after the datacube guide (not only because of this, just generally).
Nice diagrams!
Some feedback/ideas:
Thank you for the feedback @soxofaan! The datacube guide with much more graphics is already online and a version with some of your corrections (type, title change) can be seen here. I'd be happy if you want to have a look and leave some more feedback!
Regarding two points from above:
these online docs look very pretty, nice improvement!
@jonathom We also forgot to remove the Data Cube desction from the glossary: https://openeo.org/documentation/1.0/glossary.html
Another thing we should talk about in the "Dimensions" section is that the dimensions can have special characteristics, e.g. spatial and temporal are expected to have a natural order, temporal are by default Gregorian calendar, ...
We also forgot to remove the Data Cube desction from the glossary
done, collecting these fixes in branch "dcguide". I added the old glossary datacube md in datacubes/.scripts for later reference.
additional note to myself: also forgot to talk about crs as dimension, as in old glossary
I added the old glossary datacube md in datacubes/.scripts for later reference.
I don't think this is required, we have version control for this. Let's discuss later
This is all done, right @jonathom ? Feel free to close then.
Thank you for the feedback @soxofaan! The datacube guide with much more graphics is already online and a version with some of your corrections (type, title change) can be seen here. I'd be happy if you want to have a look and leave some more feedback!
Regarding two points from above:
* a clouded area is a good idea, however I think a lot of operations that are explained here wouldn't be executed on non-ARD. Might be confusing then. * space between the layers: Because the graphics are about different sampling processes, visibility of the single cubes isn't in focus in these graphics. However if you think other graphics in the datacube guide could use space / enlargement etc., let me know!
Really nice guide! I've just seen it and it will be super useful for many others.
It became obvious several times in openEO history that people often are not aware of how data cubes and their methods (reduce, apply, ...) work. So I was thinking that a guide how to work with data cubes would help the understanding, step by step with examples.
Discussion in https://github.com/Open-EO/openeo-processes/pull/215#discussion_r551218809 have shown that the document should say that it's usually not a good idea to change data types in apply/reduce/... and probably also list other pitfalls and potential limitations.