Open epetrovski opened 5 years ago
I like this. How easy is it to edit this format? Or would we assign you to add new topics?
There's a powerpoint. Editing is doable but definitely a hassle since it involves a lot of copy/pasting.
I would be happy to update the cheat sheet by assignment if you want new topics covered and I'll accept pull requests as well.
Currently, there's no more room for new stuff, though, but I'm thinking of a separate fread+fst cheat sheet (fast data import) which would make room for more stuff.
Your work on cheatsheet is highly appreciated. I also think that many users already had opportunity to learn from cheatsheet you made. So thank you for that!
I would avoid using a cheatsheet made from powerpoint because it will be difficult in maintenance. I also think we should aim to provide a single data.table cheatsheet. So ideally would be to merge content from https://s3.amazonaws.com/assets.datacamp.com/img/blog/data+table+cheat+sheet.pdf and https://github.com/rstudio/cheatsheets/blob/master/datatable.pdf into single cheatsheet. Using some open source editing tool, so it will be easier to maintain.
Thank you @jangorecki !
I agree that a powerpoint based cheat sheet is too difficult to maintain. My choice of software was solely guided by the fact that I had to conform to the RStudio visual guidelines for cheat sheets in order to get it on their website. They had a powerpoint template ready to go...
But something made in Rmarkdown or the like would be way more optimal. I'll see if there's anything I can do about it but others are more than welcome to give it a shot as well :)
Thanks for info on that. I filled: https://github.com/rstudio/cheatsheets/issues/97
For each section, is there scope to hyperlink to relevant data.table vignettes?
hi @epetrovski I was wondering if you could please update the cheat sheet section "RESHAPE TO LONG FORMAT" to use the new features in data.table 1.15.0 released today?
> melt(data.table(id=c("A","B"),a_x=1,a_z=2,b_x=3,b_z=4), measure.vars=measure(value.name, y, sep="_"))
id y a b
<char> <char> <num> <num>
1: A x 1 3
2: B x 1 3
3: A z 2 4
4: B z 2 4
Sorry, but it's been ages since I've used data.table
or R
for that matter. Others should please feel free to update the cheat sheet powerpoint and remove my contact info.
thanks for the info @epetrovski Would anybody else like to volunteer to update the cheat sheet? @Anirban166 @maradestefanis
Hi @tdhock I would like update the cheat sheet section "RESHAPE TO LONG FORMAT"
@tdhock The updated is done please check if it is ok. I made two changes ( yours and removed contact) datatable_updated.pptx datatable_updated.pdf
@tdhock The updated is done please check if it is ok. I made two changes ( yours and removed contact) datatable_updated.pptx datatable_updated.pdf
Could you also update the data.table version and date in the footnote?
thanks @MaraDestefanis that is a great improvement.
I think it would be good to still write something like "Created by Erik Petrovski and Mara Destefanis mara@email.com" is that ok with you?
Also for the melt code, I think it would be easier to understand (and be more consistent with the other examples), if you change the argument from data.table(id= c(“A”,“B”),a_x=1,a_z=2,b_x=3,b_z=4)
to dt
, so:
melt(dt, measure.vars=measure(value.name, y, sep="_"))
what do you think?
Also for the argument docs how about:
Reshape a data.table from wide to long format.
dt: a data.table.
measure.vars: Columns containing values to fill into cells, often using measure() or patterns().
id.vars: character vector of ID column names. (optional)
variable.name, value.name: names for output columns (optional)
Also if you think it is appropriate, and if there is enough room, could you please add some documentation for measure()?
measure(out_name1, out_name2, sep="_", pattern="([ab])_(.*)")
sep (separator) or pattern (regular expression) are used to specify columns to melt, and parse input column names.
out_name1, out_name2: names for output columns (creates single value column), or value.name (creates value column for each unique value of the corresponding part of the melted column name).
@ben-schwen done. @tdhock done, please check.
There is not enough room but I added the best I could.
Notes: Check the code -I'm not sure if it's okay -
The document, we need make sure to keep the font size the same at all points and remember that spaces are important too. That's where we draw the line
datatable_updated(1).pptx
datatable_updated(1).pdf
datatable_cheat_sheet_TDH_14_Feb_2024.pdf datatable_cheat_sheet_TDH_14_Feb_2024.pptx
Hi Mara, Thanks for the quick revisions! I changed a couple of things, what do you think?
data.table(id= c(“A”,“B”),a_x=1,a_z=2,b_x=3,b_z=4)
to dt
@tdhock I am delighted to collaborate, thanks to you. I'll squeeze in some time this weekend to make those changes and hit you back. (At this moment I trust your judgment until I have better expertise). Let's go forward.
Hi @tdhock I add files, you want to check it, I did it this way. tell me if you want me to change something data.table_update(2).pdf
hi @MaraDestefanis thanks for sharing. Can you please tell me what are the differences/improvements in your version, with respect to my revisions from https://github.com/Rdatatable/data.table/issues/3374#issuecomment-1944868244 ?
hi @tdhock ,
Please point out the details that I may have misunderstood, and I will gladly apply them
Capitalize X for a and use lowercase z for b: "X x Z z" -> For consistency with the reshape to wide/dcast example, I think it would be better to maintain consistency. (in my version the b_x is consistent between the two, whereas in your version, dcast example has b_x, and melt example has b_X)
Use lowercase "sep" in the text ´Reshape a data table from...´ -> my version already had lowercase sep.
Previously, in the code, it was "dt". I changed it to "data.table". -> for consistency with the other examples, in which the first argument is usually dt
I think it would be better to keep it dt
instead of data.table(id= c(“A”,“B”),a_X=1,a_z=2,b_X=3,b_z=4)
So overall I think it would be better to keep the version with the changes I proposed in https://github.com/Rdatatable/data.table/issues/3374#issuecomment-1944868244 If you agree, then there are no new changes to apply. Or am I missing something?
@tdhock . Back at it again. Would you mind reviewing this? If everything checks out, rename the file. If there are any issues, please highlight them for clarification, and I'll make the adjustments .
your new version still has some of the same issues I mentioned, and a white line over the authors at the bottom. did you see the revised files I uploaded in https://github.com/Rdatatable/data.table/issues/3374#issuecomment-1944868244 ? I believe that version fixes the issues I mentioned. Can you please look at that version and tell me if you approve?
Hi @tdhock, thanks for taking the time to review that. I applied what you suggested in comment #3374. I hope I didn't make any mistakes with any issues. However, if I did, tell me again until it's perfect.
Mara, your version still does not address the issue I mentioned above in this comment: https://github.com/Rdatatable/data.table/issues/3374#issuecomment-1944868244 Please do not edit nor make a new version, which I believe is wasting our time due to some mis-communication. Instead, please read these files which I linked above in that comment, and I link again below here for clarity: datatable_cheat_sheet_TDH_14_Feb_2024.pdf datatable_cheat_sheet_TDH_14_Feb_2024.pptx and tell me if you think they are ok (I believe they are OK).
Hi @tdhock sorry for the delay. Yes is OK this version. I'll add it again, with just a little adjustment only in the text.
(at some point in the reviews, I got lost with this comment: capitalization X x Z z Sep sep, sorry for your time)
data_table_cheat_sheet.pptx data_table_cheat_sheet.pdf
Feel free to let me know if there's anything else you'd like to change. Note: these days I will try to make a quarto version. If we need to translate, I'm available for that too.
Hi Mara thanks for sharing. Can you please clarify what exactly you changed in your new version? "with just a little adjustment only in the text" https://github.com/Rdatatable/data.table/issues/3374#issuecomment-1981800429
Your new version still does not fix the issue I mentioned previously "make long tables identical (previous version showed the same data but rows in a different order)" which is fixed in my previous version https://github.com/Rdatatable/data.table/issues/3374#issuecomment-1944868244 -- can you please use that version if you want to make future modifications?
In particular the issue can be seen below
It is fixed in my version as can be seen below
Your new version also still has another issue I mentioned, https://github.com/Rdatatable/data.table/issues/3374#issuecomment-1964987216 "a white line over the authors at the bottom" see below.
Thanks, @tdhock for the detailed clarification, I really needed that. Could you please double-check if it's good now? I'm totally cool with reviewing it as many times as we need to get it right.
Hi Mara that is a lot better thanks. A couple of minor suggestions: please change "and parse input column names." to "and to parse input column names."
please remove space after open parenthesis: change "value.name ( creates" to "value.name (creates"
Hi @tdhock I'm sending that text again with the review you talked about. Take a look and let me know what's next. Thanks for being patient and explaining stuff.
looks good, what do other people think?
please make a minor correction:
melt(dt,
measure.vars= measure (
value.name, y,sep="_"))
put space before equals sign (measure.vars =) and space after comma (y, sep)
Hey @tdhock, great to hear from you, Can you just check this out? data_table_cheat_sheet.pdf data_table_cheat_sheet.pptx
Hi Mara that looks great, can you please submit a PR to https://github.com/rstudio/cheatsheets that updates the cheat sheet?
great, the rstudio repo has accepted our updated cheatsheet so now we just need to update the link on the readme
or maybe delete the old one? (probably better to avoid confusion)
Hi @tdhock Toby,
I'm aware of the news, that's great! I'm not sure if I need to do anything now, but I'm keeping an eye on it. And our feedback.
Some time ago, I published a cheat sheet for data.table and I've been expanding on it quite a bit since then: https://github.com/rstudio/cheatsheets/blob/master/datatable.pdf
It's currently hosted on RStudio's homepage (and GitHub) but I was wondering if there's room for it on the data.table homepage as well?
Just to be clear, I don't think it should replace the existing cheat sheet on the data.table homepage. I'm trying to brand this as a "visual cheat sheet" mostly aimed at people who are new or casual data.table users.