Data Cube Documentation ✨

m-mohr commented 3 years ago

Rendered version for reviews: https://github.com/Open-EO/openeo.org/blob/datacube_doc/documentation/1.0/datacubes.md

ToDos:

[x] Finish documentation (@jonathom)
[x] Better formatting for figcaptions
[x] Display images better (larger) [split reducer?]
[x] Add to navigation

m-mohr commented 3 years ago

Make sure to pull my latest changes to not run into merge conflicts @jonathom

m-mohr commented 3 years ago

Note to myself: Squash and merge to make history of images lighter

jonathom commented 3 years ago

ready for review

christophfriedrich commented 3 years ago

Suggestions regarding the graphics:

apply_unary: In the text the function is called abs(), in the graphic absolute()
resample_space: With this perspective, the resampling is hardly visible. Is it possible to "turn" the rendering ("into the screen plane" if you get what I mean?) to get more of a top view? Or maybe even better: Don't use the timeseries representation, but the flat one?
resample_time: Why is there no data for 2020-09-28?
reduce: For the example in the middle, could you move the "time" dimension name closer to the rendering?

edzer commented 3 years ago

Still need to do my review, but at first glance this looks REALLY, REALLY good!! Worth looking (and commenting?): @mkadunc @jdries @aljacob @przell

jonathom commented 3 years ago

Thank you for your review @christophfriedrich! To answer your comment:

apply_unary: In the text the function is called abs(), in the graphic absolute()

Changed it to absolute().

resample_space: With this perspective, the resampling is hardly visible. Is it possible to "turn" the rendering ("into the screen plane" if you get what I mean?) to get more of a top view? Or maybe even better: Don't use the timeseries representation, but the flat one?

It would be possible to turn the cube, but would mess up the consistency throughout the plots. I'm not sure if flat plots would make a difference in available space here. @m-mohr also proposed to split the graphics for the two processes. I could also make them all just bigger (also messing with consistency), something in the likes of this (old version for reference): exp_resample_space_new

resample_time: Why is there no data for 2020-09-28?

Whether there is data for previous time steps is dependent on the used resampling function. We thought it's more intuitive to not have data there.

reduce: For the example in the middle, could you move the "time" dimension name closer to the rendering?

Yes I could, but I also considered consistency here, and also to show the original extent, to emphasize the reducing thats taking place.

Also, I liked the idea of adding process examples with links and went ahead and did it. Apart from that, I think another important point coming from your suggestions is whether or not dimension names should also be code-highlighted. I only highlighted dimension labels before, also to make a difference between concrete values and conventional dim names. More opinions on this are appreciated.:)

christophfriedrich commented 3 years ago

It would be possible to turn the cube, but would mess up the consistency throughout the plots.

Valid point, but IMO that wouldn't be overly bad, the gain is worth it.

I'm not sure if flat plots would make a difference in available space here.

It would certainly make a difference! Mainly because it would remove the perspective. A 400x400 px square shows more than a 100x300px skewed ~rectangle~ quadrangle...

@m-mohr also proposed to split the graphics for the two processes.

I don't get what exactly you mean. One graphic each for down- and upsampling?

I could also make them all just bigger (also messing with consistency)

I like your example, this definitely improves things, and I think the consistency issue here is even more negligible than above. Still, I guess I'd prefer the flat view. If you want to show something spatial in detail, the timeseries-oriented representation just isn't so useful.

Whether there is data for previous time steps is dependent on the used resampling function. We thought it's more intuitive to not have data there.

It wasn't for me :D 2020-09-28 is closer to the existing data point 2020-10-01 than 2020-10-06, so nearest neighbour should yield a result here too. If it was bilinear interpolation, there shouldn't be data for 2020-10-30 either :thinking: I'd prefer data for the first point too, or alternatively, add a note to the figcaption.

reduce: For the example in the middle, could you move the "time" dimension name closer to the rendering?

Yes I could, but I also considered consistency here, and also to show the original extent, to emphasize the reducing thats taking place.

Valid point. Maybe add a small grey border around each result then?

whether or not dimension names should also be code-highlighted. I only highlighted dimension labels before, also to make a difference between concrete values and conventional dim names. More opinions on this are appreciated.:)

I would highlight them in some way. Maybe keep code-highlighting for labels and put names in italics?

But then again, function names are code-highlighted too...

jonathom commented 3 years ago

I code-highlighted dimension names as well :)

As for the resample-spatial figure:

Regarding the flat view: To me, displaying the whole cube is important here to communicate that this process is done on the complete thing. That is what I mean when I say, the flat plot wouldn't improve space issue: In the end, the single grids would again be very small, when the complete flat grid is plotted for input and output.
"One graphic each for down- and upsampling?" -> yes.
I prefer making the grids as they are a bit bigger. Because the "resample to" grids are plotted well-visible, I think that would be enough to understand whats happening here. Maybe we can also wait for some community feedback :)

For the temporal resampling: Good points. I will add a timestep then.

And lastly for the reduce-plot: I moved the "time" a bit to the right, will push with the other figure updates.

jonathom commented 3 years ago

Once we have this discussion going (@christophfriedrich), what would you think of including the emoji explanations from the current glossary? Like so:

Screenshot from 2021-03-29 02-34-21

@m-mohr and I initially thought we would leave them out, as to not overload the page & only use one approach on communicating the processes. But the documentation could also benefit from either putting these right below the title or at the very end of every function paragraph. What do you think?

m-mohr commented 3 years ago

We could do it right below the title as an "introduction", something like:

tl;dr: filter([🌽, 🥔, 🐔], isVegetarian) => [🌽, 🥔] ;-)

How many issues are still open in this PR? I think we should merge it asap. :-)

edzer commented 3 years ago

I'll give it a full read; is https://github.com/Open-EO/openeo.org/blob/datacube_doc/documentation/1.0/datacubes.md still the most up-to-date rendered version?

m-mohr commented 3 years ago

Yes, it's always linking to the latest commit.

christophfriedrich commented 3 years ago

I code-highlighted dimension names as well :)

Me gusta :+1:

Regarding the flat view: To me, displaying the whole cube is important here

Well, the flat view displays the whole cube too?! I'm not sure whether we're talking about the same thing, so I made a sketch. I meant it like this:

(For the graphics in the bottom row, I indicated the changes in the top left square only)

IMO removing the perspective helps. But:

I prefer making the grids as they are a bit bigger.

Looking at the guide right now with the bigger grids I think the point comes across, so I'm not objecting to leaving it as it is :)

"One graphic each for down- and upsampling?" -> yes.

I think this can stay as it is too.

For the temporal resampling: Good points. I will add a timestep then.

:+1:

And lastly for the reduce-plot: I moved the "time" a bit to the right, will push with the other figure updates.

Looks good now :)

christophfriedrich commented 3 years ago

Regarding the emoji question: I like these explanations -- not only do they explain the concepts once more without the datacube complexity, but they also do so in quite an entertaining way.

I'm unsure about where to put them though. Having a "tl;dr" right below a title doesn't make sense. If we go for this option, calling it "In short" as in your screenshot would be better. But I'd prefer to explain the concept in (a few) words first before the user is confronted with this representation. Putting them at the end seems good because it gives you "a second chance to get it" if you didn't understand the previous stuff 100%. But sitting below the big figures, close to the next section's heading, kind of loses the connection.

Maybe they could be fused into the first few sentences of each section? E.g.:

When filtering data (e.g. filter_spatial, filter_temporal, filter_bands), only the data that satisfies a condition is returned. In a general example, from a list of ingredients, only some are vegetarian: filter([🌽, 🥔, 🐔], isVegetarian) => [🌽, 🥔]. In the EO datacube world, this condition could be a timestamp or interval, (a set of) coordinates, or specific bands. By applying filtering the datacube becomes smaller, according to the selected data.

But that makes a jump totally out of the scope and back in again, so I'm not a complete fan.

Maybe render them as pictures that float: right at the top of each section?

I don't have a favourite yet.

We could definitely merge first though and worry about this "nice to have addon" later, because in response to @m-mohr's question

How many issues are still open in this PR? I think we should merge it asap. :-)

I believe there's nothing in here anymore where I'd say "this blocks it for me".

m-mohr commented 3 years ago

Re emojis: Yeah, maybe after the first paragraph when the basics are laid out. I'd keep them in their boxes. I have no strong preference wrt the "heading" we use for it. "In short" might be the best option if put after the first paragraph. Don't render them as images, that is bad for accessibility.

christophfriedrich commented 3 years ago

Don't render them as images, that is bad for accessibility.

It would help in cross-platform understanding. At least with the font on my machine, I can barely see that this chicken is supposed to be a chicken. And does anyone really benefit from a screenreader reading "filter paranthese bracket ear of corn..."? Also, there's always the alt tag.

Edit: But yes, a "in short" box after the first paragraph sounds good :+1:

m-mohr commented 3 years ago

Oh, you meant the single emojis as images? I understood it as "make the whole box an image" (due to the float:right, I think). I'm still in favor to avoid images, we can also assign a "title" tag to the emojis so that one could hover for better understanding.

christophfriedrich commented 3 years ago

No, I meant rendering the whole expression filter([🌽, 🥔, 🐔], isVegetarian) => [🌽, 🥔] as an image. But title (on a <span>?) for the emojis sounds good too.

christophfriedrich commented 3 years ago

To elaborate more on

How many issues are still open in this PR?

From my comments, all but 3 have been marked as resolved. Of the remaining three,

one has a specific suggestion that could just be merged,
another related one can be fixed easily by just including the remaining infos in the table too (or just just click "resolved" without doing anything if you think it's not necessary),
and for the last one I'm not sure whether it's actually an issue and if yes how to resolve it, maybe @edzer can help, I'll tag him there too.

jonathom commented 3 years ago

Thank you for your suggestions, here is what I settled on: Screenshot from 2021-03-29 12-57-20 All of the code-highlighted example is given the title, because I can't insert html tags inside code-highlighted text, plus I would have wanted a complete textual description anyway. The boxes are inserted after the first paragraph each. Chicken is changed to pig.

Re: the resample figure, I know what you mean. But I'm also happy with it now.

Except for the "spectral resolution" issue and changes @edzer may have there is nothing else, and if nobody objects to the current proposed solution ("how many and what bands") I'm just gonna commit it.

edzer commented 3 years ago

Looks great! Besides my comment above on the spectral resolution, I have the following suggestions:

I think the explanation of apply* now only explains apply_kernel, and not apply_dimension; this can be mentioned, and an example of apply_dimension could be added. An example could be: apply_dimension could apply a function to the time dimension that returns the three strongest break points, along with their timing and significance. (or do we need sth simpler?)
to the sentence "The resulting grid still carries the same pixel information, but in higher spatial resolution.", add: "Other upsampling methods may yield smoother results, e.g. by using interpolation".
replace "the dimension along which aggregate is performed" with "the dimension along which the reducer function is applied"

jonathom commented 3 years ago

Thank you for the suggestions @edzer, I just committed points 2 and 3. One extra question: Is it band or bands dimension?

m-mohr commented 3 years ago

One extra question: Is it band or bands dimension?

bands

m-mohr commented 3 years ago

Thank you @jonathom for all the work you put into it, it is a really good introduction, I think. Thank you for all the reviews.

I'll go ahead and merge this now. We can fine-tune this later via separate PRs, e.g. adding an image for apply_dimension seems to be a good idea.

Open-EO / openeo.org

Data Cube Documentation ✨ #31