elixir-explorer / explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
https://hexdocs.pm/explorer
MIT License
1.06k stars 115 forks source link

Add a cheatsheet #723

Open cigrainger opened 9 months ago

cigrainger commented 9 months ago

I'm excited about cheatsheets and something like this would beat "Ten minutes to Explorer", especially for those coming from dplyr or pandas who just need an easy reference.

dplyr: https://nyu-cdsc.github.io/learningr/assets/data-transformation.pdf also dplyr: https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf pandas: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

I started playing over the weekend but I think the diagrams are really powerful and got myself stuck trying to figure out how to replicate them in vega lite.

billylanchantin commented 9 months ago

I'd love to see cheatsheets too! Personally, I think it would've saved me some documentation hunting.

For example, early on I saw there was a DataFrame.filter/2 but I couldn't find a corresponding Series.filter/2. I eventually found Series.mask, but it wasn't obvious. Something like a task-oriented cheatsheet would've made that much easier to find.

I think the diagrams are really powerful and got myself stuck trying to figure out how to replicate them in vega lite.

I also feel like I've lost a good bit of time trying to get vegalite to output specific visualizations. It's great for a lot of cases. But when you start doing non-data-driven things like annotating your visualization, I found that it gets tricky.

Were you thinking of doing a .cheatmd? Or a .pdf?

cigrainger commented 9 months ago

Glad to hear it would be helpful! I was thinking of doing a .cheatmd so we can easily put it into the docs. I feel like it should be possible to get close with heatmaps based on specific categorical values. And I also agree that it's confusing we have Series.mask/2 instead of Series.filter/2.

@philss or @josevalim I'm sure there was a convo about this but I can't remember what the reasoning was for this anymore. https://github.com/elixir-explorer/explorer/pull/326#issue-1352162455

josevalim commented 9 months ago

We changed the implementation and renamed at the same time but I am fine with reverting the name back to filter. :) It should be a quick change and we can add:

@deprecated "Use Explorer.Series.filter/2 instead"
def mask(s1, s2), do: filter(s1, s2) 

It will certainly be much easier to find.

billylanchantin commented 9 months ago

Oh I didn't mean to pick up a stray issue! I was just using it as an example.

We changed the implementation and renamed at the same time but I am fine with reverting the name back to filter.

It may be worth having both since mask/2 and filter/2 accept different datatypes: mask takes a boolean series while filter/2/filter_with/2 take a query/function. If you have the boolean series on hand, you'd want mask/2. But if you're finding that you need to build the boolean series e.g. with transform/2 only to pass it right into mask/2, filter/2 would be convenient.

billylanchantin commented 9 months ago

I feel like it should be possible to get close with heatmaps based on specific categorical values.

I agree! I was more worried about the arrows:

Screen Shot 2023-10-23 at 12 01 45 PM

Though if you could embed the diagrams in a table, you could probably achieve a similar effect w/o the need for the arrow annotations.

josevalim commented 9 months ago

It may be worth having both since mask/2 and filter/2 accept different datatypes: mask takes a boolean series while filter/2/filter_with/2 take a query/function.

The issue is that doing it with a function is horribly expensive and should be generally avoided.

billylanchantin commented 9 months ago

Hey I wrote this a little after the earlier discussion. If it's not helpful just ignore me :)

https://vega.github.io/editor/#/gist/e0675e1408ba1944deb1a747f03a060d/spec.json

DPLYR VegaLite
Screen Shot 2023-10-26 at 3 48 06 PM Screen Shot 2023-10-26 at 3 53 01 PM

Note that it does appear to be possible to add margins to the rectangles:

https://vega.github.io/vega-lite/examples/rect_mosaic_labelled_with_offset.html

But my cursory reading of that example makes it seem a bit complex:

    {
      "calculate": "datum.y + (datum.rank_Cylinders - 1) * datum.distinct_Cylinders * 0.01 / 3",
      "as": "ny"
    },
cigrainger commented 9 months ago

Super helpful! Thank you @billylanchantin! I also don't think the margins are too important :).