easystats / datawizard

Magic potions to clean and transform your data 🧙
https://easystats.github.io/datawizard/
Other
212 stars 16 forks source link

Creating a visual schematic diagram for data wrangling workflow in `{datawizard}` #87

Open IndrajeetPatil opened 2 years ago

IndrajeetPatil commented 2 years ago

IMHO, the current README is quite dull and long-winded, and doesn't provide much insight into how this package can be useful for the users.

What we need is for it to feature a visual schematics like the following ones in our other popular high-level packages:

image

image

Of course, paging our in-house visualization wizard @DominiqueMakowski! 🪄

Needless to say, this is low-priority and if you think so necessary, we can definitely wait for the package to even mature further.

DominiqueMakowski commented 2 years ago

What would it contain? I can start a draft with powerpoint

IndrajeetPatil commented 2 years ago

@DominiqueMakowski How about something like this? (cc @bwiernik, @strengejacke, @mattansb, @etiennebacher)

unnamed

Of course, there is a lot of room for improvement here. Specifically,

DominiqueMakowski commented 2 years ago

I can give it a go next week (do ping me then if you remember :)

The list of functions I've included in the two columns is incomplete.

It's okay not to be comprehensive otherwise we will be obsolete as soon as we add a new function, better perhaps to create like a wordcloud or something like that

IndrajeetPatil commented 2 years ago

Yeah, I agree. That's why I had put the ... in those columns. I don't think we need to be comprehensive, but we should definitely include the most important ones (filter, select, join, etc.).

bwiernik commented 2 years ago

I think a separate viz of data cleaning versus data summary functions would be good

IndrajeetPatil commented 2 years ago

@DominiqueMakowski It will be nice to have something like this in the JOSS paper.

DominiqueMakowski commented 2 years ago

Will do within the next couple of days

DominiqueMakowski commented 2 years ago

Would be nice to generate a wordcloud of the functions tho

DominiqueMakowski commented 2 years ago

Wordlist for wordclouds (https://www.wordclouds.com/):

data_filter() data_select() data_to_long() data_to_wide() data_rotate() data_rename() data_relocate() data_join()

standardize() normalize() center() degroup() winsorize() data_cut() data_recode() data_shift()

IndrajeetPatil commented 2 years ago

I want to wait for #57 and #197 to be resolved before we can include the following functions in the wordcloud:

data_cut() data_recode() data_shift()

We should avoid including any functions names in a publication that we are not sure will survive for long.

DominiqueMakowski commented 2 years ago

you're right, I'll come up with a diagram prototype nonetheless and then we can fine-tune the wordcloud

DominiqueMakowski commented 2 years ago

We can focus on the dirty clothes metaphor but it lacks some text at the bottom? (feel free to directly edit the powerpoint on the diagram branch!)

image

IndrajeetPatil commented 2 years ago

Thanks, Dom! This looks like a great start.

I think one way this can be improved is by making it visually less busy and more minimal. Additionally, we need to mention only a few (key and most useful) functions and just have ... (which will cover all the other existing or future functions).

I don't like the star shape in the "Transformations" section.

Maybe this can be an ironing table with a shirt on it? As in, imperfections in prepared data are ironed out using statistical transformations before the data is ready to be fed into a statistical model.

Instead of "No dependencies", I'd write "Lightweight", since we do import{insight}.

IndrajeetPatil commented 2 years ago

I also want to hear what @etiennebacher, @strengejacke, @bwiernik, @mattansb think about the current status of the illustration and how it can be further improved.

bwiernik commented 2 years ago

I agree with Indra's comments and don't have much more to add there. I like the ironing metaphor (maybe the function names in a cloud of steam?). And agree that making the function names less busy/stand out more would be good

mattansb commented 2 years ago

Looks good. I would maybe change the color of the bg color of the washing machine to a lighter blue? And for transform use the non data_* variant names.

etiennebacher commented 2 years ago

Looks good to me too, but it's a bit hard to read most function names in steps 2 and 3. Maybe you can remove the very small ones to increase the size of the others?

IndrajeetPatil commented 2 years ago

Thank you all for great suggestions!

WDYT, @DominiqueMakowski? Will this be possible? Don't know how complicated it will be to design.

strengejacke commented 2 years ago

Before we finalize this, we should definitely decide on the new function names. Mostly, I'm not quite satisfied with change_code(). recode is a verb, and everyone would expect such a function would recode old into new values. in change_code(), the verb is change, and what can we expect if we change a code? What we actually change when recoding are values (or factor levels, but values is maybe more generic). What do you think about change_values()? or maybe recode_values(), or recode_variables().

bwiernik commented 2 years ago

The "code" being the mapping of quantities to values/labels. So the function is changing the coding scheme used.

Maybe change_coding()? change_values() would be my second choice

IndrajeetPatil commented 2 years ago

@DominiqueMakowski Let us know if these suggestions make sense.

IndrajeetPatil commented 2 years ago

bump

strengejacke commented 2 years ago

hello-mcfly

IndrajeetPatil commented 2 years ago

bump

DominiqueMakowski commented 2 years ago

Is that the correct list?

Preparation: data_filter() data_select() data_to_long() data_to_wide() data_rotate() data_rename() data_relocate() data_join()

Transformation: standardize() normalize() center() degroup() winsorize() categorize() change_code() slide()

DominiqueMakowski commented 2 years ago

thanks for the bumps 🙊

IndrajeetPatil commented 2 years ago

These need to change to their new names:

Btw, feel free to not include all of them. Whatever looks better with the chosen graphic design.

bwiernik commented 2 years ago

recode_values() not change_code()

IndrajeetPatil commented 1 year ago

bump