Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.63k stars 987 forks source link

Room for new data.table cheat sheet on the homepage? #3374

Open epetrovski opened 5 years ago

epetrovski commented 5 years ago

Some time ago, I published a cheat sheet for data.table and I've been expanding on it quite a bit since then: https://github.com/rstudio/cheatsheets/blob/master/datatable.pdf

It's currently hosted on RStudio's homepage (and GitHub) but I was wondering if there's room for it on the data.table homepage as well?

Just to be clear, I don't think it should replace the existing cheat sheet on the data.table homepage. I'm trying to brand this as a "visual cheat sheet" mostly aimed at people who are new or casual data.table users.

MichaelChirico commented 5 years ago

I like this. How easy is it to edit this format? Or would we assign you to add new topics?

epetrovski commented 5 years ago

There's a powerpoint. Editing is doable but definitely a hassle since it involves a lot of copy/pasting.

I would be happy to update the cheat sheet by assignment if you want new topics covered and I'll accept pull requests as well.

Currently, there's no more room for new stuff, though, but I'm thinking of a separate fread+fst cheat sheet (fast data import) which would make room for more stuff.

jangorecki commented 5 years ago

Your work on cheatsheet is highly appreciated. I also think that many users already had opportunity to learn from cheatsheet you made. So thank you for that!

I would avoid using a cheatsheet made from powerpoint because it will be difficult in maintenance. I also think we should aim to provide a single data.table cheatsheet. So ideally would be to merge content from https://s3.amazonaws.com/assets.datacamp.com/img/blog/data+table+cheat+sheet.pdf and https://github.com/rstudio/cheatsheets/blob/master/datatable.pdf into single cheatsheet. Using some open source editing tool, so it will be easier to maintain.

epetrovski commented 5 years ago

Thank you @jangorecki !

I agree that a powerpoint based cheat sheet is too difficult to maintain. My choice of software was solely guided by the fact that I had to conform to the RStudio visual guidelines for cheat sheets in order to get it on their website. They had a powerpoint template ready to go...

But something made in Rmarkdown or the like would be way more optimal. I'll see if there's anything I can do about it but others are more than welcome to give it a shot as well :)

jangorecki commented 5 years ago

Thanks for info on that. I filled: https://github.com/rstudio/cheatsheets/issues/97

KyleHaynes commented 4 years ago

For each section, is there scope to hyperlink to relevant data.table vignettes?

tdhock commented 10 months ago

hi @epetrovski I was wondering if you could please update the cheat sheet section "RESHAPE TO LONG FORMAT" to use the new features in data.table 1.15.0 released today?

> melt(data.table(id=c("A","B"),a_x=1,a_z=2,b_x=3,b_z=4), measure.vars=measure(value.name, y, sep="_"))
       id      y     a     b
   <char> <char> <num> <num>
1:      A      x     1     3
2:      B      x     1     3
3:      A      z     2     4
4:      B      z     2     4
epetrovski commented 10 months ago

Sorry, but it's been ages since I've used data.table or R for that matter. Others should please feel free to update the cheat sheet powerpoint and remove my contact info.

tdhock commented 9 months ago

thanks for the info @epetrovski Would anybody else like to volunteer to update the cheat sheet? @Anirban166 @maradestefanis

MaraDestefanis commented 9 months ago

Hi @tdhock I would like update the cheat sheet section "RESHAPE TO LONG FORMAT"

MaraDestefanis commented 9 months ago

@tdhock The updated is done please check if it is ok. I made two changes ( yours and removed contact) datatable_updated.pptx datatable_updated.pdf

ben-schwen commented 9 months ago

@tdhock The updated is done please check if it is ok. I made two changes ( yours and removed contact) datatable_updated.pptx datatable_updated.pdf

Could you also update the data.table version and date in the footnote?

tdhock commented 9 months ago

thanks @MaraDestefanis that is a great improvement. I think it would be good to still write something like "Created by Erik Petrovski and Mara Destefanis mara@email.com" is that ok with you? Also for the melt code, I think it would be easier to understand (and be more consistent with the other examples), if you change the argument from data.table(id= c(“A”,“B”),a_x=1,a_z=2,b_x=3,b_z=4) to dt, so: melt(dt, measure.vars=measure(value.name, y, sep="_")) what do you think?

tdhock commented 9 months ago

Also for the argument docs how about:

Reshape a data.table from wide to long format.
dt: a data.table.
measure.vars: Columns containing values to fill into cells, often using measure() or patterns().
id.vars: character vector of ID column names. (optional)
variable.name, value.name: names for output columns (optional)
tdhock commented 9 months ago

Also if you think it is appropriate, and if there is enough room, could you please add some documentation for measure()? measure(out_name1, out_name2, sep="_", pattern="([ab])_(.*)") sep (separator) or pattern (regular expression) are used to specify columns to melt, and parse input column names. out_name1, out_name2: names for output columns (creates single value column), or value.name (creates value column for each unique value of the corresponding part of the melted column name).

MaraDestefanis commented 9 months ago

@ben-schwen done. @tdhock done, please check.
There is not enough room but I added the best I could. Notes: Check the code -I'm not sure if it's okay -
The document, we need make sure to keep the font size the same at all points and remember that spaces are important too. That's where we draw the line datatable_updated(1).pptx datatable_updated(1).pdf

tdhock commented 9 months ago

datatable_cheat_sheet_TDH_14_Feb_2024.pdf datatable_cheat_sheet_TDH_14_Feb_2024.pptx

Hi Mara, Thanks for the quick revisions! I changed a couple of things, what do you think?

MaraDestefanis commented 9 months ago

@tdhock I am delighted to collaborate, thanks to you. I'll squeeze in some time this weekend to make those changes and hit you back. (At this moment I trust your judgment until I have better expertise). Let's go forward.

MaraDestefanis commented 9 months ago

Hi @tdhock I add files, you want to check it, I did it this way. tell me if you want me to change something data.table_update(2).pdf

datatable_updated(2).pptx

tdhock commented 9 months ago

hi @MaraDestefanis thanks for sharing. Can you please tell me what are the differences/improvements in your version, with respect to my revisions from https://github.com/Rdatatable/data.table/issues/3374#issuecomment-1944868244 ?

MaraDestefanis commented 9 months ago

hi @tdhock ,

Please point out the details that I may have misunderstood, and I will gladly apply them

tdhock commented 9 months ago

Capitalize X for a and use lowercase z for b: "X x Z z" -> For consistency with the reshape to wide/dcast example, I think it would be better to maintain consistency. (in my version the b_x is consistent between the two, whereas in your version, dcast example has b_x, and melt example has b_X)

Use lowercase "sep" in the text ´Reshape a data table from...´ -> my version already had lowercase sep.

Previously, in the code, it was "dt". I changed it to "data.table". -> for consistency with the other examples, in which the first argument is usually dt I think it would be better to keep it dt instead of data.table(id= c(“A”,“B”),a_X=1,a_z=2,b_X=3,b_z=4)

So overall I think it would be better to keep the version with the changes I proposed in https://github.com/Rdatatable/data.table/issues/3374#issuecomment-1944868244 If you agree, then there are no new changes to apply. Or am I missing something?

MaraDestefanis commented 9 months ago

@tdhock . Back at it again. Would you mind reviewing this? If everything checks out, rename the file. If there are any issues, please highlight them for clarification, and I'll make the adjustments .

datatable_updated(3).pptx data.table_update(3).pdf

tdhock commented 9 months ago

your new version still has some of the same issues I mentioned, and a white line over the authors at the bottom. did you see the revised files I uploaded in https://github.com/Rdatatable/data.table/issues/3374#issuecomment-1944868244 ? I believe that version fixes the issues I mentioned. Can you please look at that version and tell me if you approve?

MaraDestefanis commented 9 months ago

Hi @tdhock, thanks for taking the time to review that. I applied what you suggested in comment #3374. I hope I didn't make any mistakes with any issues. However, if I did, tell me again until it's perfect.

data.table_update(4).pdf

tdhock commented 9 months ago

Mara, your version still does not address the issue I mentioned above in this comment: https://github.com/Rdatatable/data.table/issues/3374#issuecomment-1944868244 Please do not edit nor make a new version, which I believe is wasting our time due to some mis-communication. Instead, please read these files which I linked above in that comment, and I link again below here for clarity: datatable_cheat_sheet_TDH_14_Feb_2024.pdf datatable_cheat_sheet_TDH_14_Feb_2024.pptx and tell me if you think they are ok (I believe they are OK).

MaraDestefanis commented 9 months ago

Hi @tdhock sorry for the delay. Yes is OK this version. I'll add it again, with just a little adjustment only in the text.
(at some point in the reviews, I got lost with this comment: capitalization X x Z z Sep sep, sorry for your time)

data_table_cheat_sheet.pptx data_table_cheat_sheet.pdf

Feel free to let me know if there's anything else you'd like to change. Note: these days I will try to make a quarto version. If we need to translate, I'm available for that too.

tdhock commented 9 months ago

Hi Mara thanks for sharing. Can you please clarify what exactly you changed in your new version? "with just a little adjustment only in the text" https://github.com/Rdatatable/data.table/issues/3374#issuecomment-1981800429

Your new version still does not fix the issue I mentioned previously "make long tables identical (previous version showed the same data but rows in a different order)" which is fixed in my previous version https://github.com/Rdatatable/data.table/issues/3374#issuecomment-1944868244 -- can you please use that version if you want to make future modifications?

In particular the issue can be seen below tables-not-same

It is fixed in my version as can be seen below tables-same

tdhock commented 9 months ago

Your new version also still has another issue I mentioned, https://github.com/Rdatatable/data.table/issues/3374#issuecomment-1964987216 "a white line over the authors at the bottom" see below. measure-revisions

MaraDestefanis commented 8 months ago

Thanks, @tdhock for the detailed clarification, I really needed that. Could you please double-check if it's good now? I'm totally cool with reviewing it as many times as we need to get it right.

data_table_cheat_sheet.pdf data_table_cheat_sheet.pptx

tdhock commented 8 months ago

Hi Mara that is a lot better thanks. A couple of minor suggestions: please change "and parse input column names." to "and to parse input column names."

please remove space after open parenthesis: change "value.name ( creates" to "value.name (creates"

MaraDestefanis commented 8 months ago

Hi @tdhock I'm sending that text again with the review you talked about. Take a look and let me know what's next. Thanks for being patient and explaining stuff.

data_table_cheat_sheet.pptx data_table_cheat_sheet.pdf

tdhock commented 8 months ago

looks good, what do other people think?

please make a minor correction:

melt(dt,
measure.vars= measure (
value.name, y,sep="_"))

put space before equals sign (measure.vars =) and space after comma (y, sep)

MaraDestefanis commented 8 months ago

Hey @tdhock, great to hear from you, Can you just check this out? data_table_cheat_sheet.pdf data_table_cheat_sheet.pptx

tdhock commented 7 months ago

Hi Mara that looks great, can you please submit a PR to https://github.com/rstudio/cheatsheets that updates the cheat sheet?

tdhock commented 5 months ago

great, the rstudio repo has accepted our updated cheatsheet so now we just need to update the link on the readme

tdhock commented 5 months ago

or maybe delete the old one? (probably better to avoid confusion)

MaraDestefanis commented 5 months ago

Hi @tdhock Toby,

I'm aware of the news, that's great! I'm not sure if I need to do anything now, but I'm keeping an eye on it. And our feedback.