Closed TysonStanley closed 8 hours ago
I believe data.table was made to play nicely with any package, by following many conventions from base R. Making a "seal approved" may give an impression that some packages works better with data.table, while others don't work well or don't work at all with data.table...
Rather than having community of packages I would prefer to have all packages to be in a community.
Thanks for your feedback on it. I agree that data.table is nicely designed to work well with all sorts of packages (and in ways that are not always obvious!). I don't think our intention would be to say there are certain packages that work best with data.table and others don't. The goal would be to help build the community around data.table. This was just one idea of how we could engage more R users and get them into the data.table repository more. We would also hope that it would spawn more ideas of how to use data.table with other packages, across more situations. The documentation (and other resources) are vast on data.table but I think there is still a lot of users that don't find it (and how to use it) early enough.
Do you have other suggestions on how to make entry into data.table use and development easier?
I have been using data.table as a user, but I know that many parts have been written in C. I have no clue where I could start to learn C to make meaningful contributions.
I would like to have some guidelines for novice users who want to contribute to data.table.
Is there anything besides an approval process {data.table} maintainers would be committing to as part of this?
Would the approval be granted in perpetuity / renewed regularly / granted with possible revocation under "certain circumstances" (which)?
I have been using data.table as a user, but I know that many parts have been written in C. I have no clue where I could start to learn C to make meaningful contributions.
I would like to have some guidelines for novice users who want to contribute to data.table.
I was on the same boat, reading code of PRs was quite useful, but what was game changer is to start coding, I started with rolling mean. Then I received great feedback in my PRs, mostly from Matt, so it was easy to pick up good practices. Often I draft naive version in R, to reflect how I will code it in C, I might even skip using functions like sum
and code it as a for loop.
I have been using data.table as a user, but I know that many parts have been written in C. I have no clue where I could start to learn C to make meaningful contributions.
I would like to have some guidelines for novice users who want to contribute to data.table.
I was on the same boat, reading code of PRs was quite useful, but what was game changer is to start coding, I started with rolling mean. Then I received great feedback in my PRs, mostly from Matt, so it was easy to pick up good practices. Often I draft naive version in R, to reflect how I will code it in C, I might even skip using functions like
sum
and code it as a for loop.
What resource do you recommend to learn C and which function of data.table would be a great point to start?
Source of the project you are going to contribute to. And which function... The one that doesn't exist yet :)
@AngelFelizR The r-contributors slack (r-contributors.slack.com) hosted a book club on learning C for R users last year:
https://github.com/r-devel/c-book-club/
I believe there are videos still available; try asking in the #book-club-modern-c
channel there. Otherwise expressing new interest is a way to get the book club running a second time (others have already inquired).
As for data.table's own C code, I think the most straightforward stuff would be:
I quite like recent improvements to GitHub's in-browser code-reading experience BTW, you can click through on function calls to find their definition / where symbols are defined / hover-over for their types.
Lastly, keep in mind that there's a ton of R code in data.table to improve as well! Over 8,000 lines already.
@MichaelChirico, thank you for your advice. I hope to contribute C code in the long term to continue progressing this amazing project.
I want to be prepared to the point where we can task moving data.table to work with data on disk.
I am here because the data.table survey asked if I wanted to contribute, so I started reading the issues.
another way to contribute, even without knowing much about how data.table works (in C or otherwise), is to look at the open issues, and try to see if you can reproduce a bug report, then add a comment on the issue that explains what you did and whether or not the issue is reproducible. (and if you can make a simpler example than what is reported, that is even better)
To make this a concrete proposal:
For example I have been developing https://cran.r-project.org/package=nc which provides named capture regex functionality, using and outputting data.tables, and I would like that package for inclusion under Seal of Approval.
Another example would be the mlr3 packages which are built using data.table.
I see the Seal of Approval as a way of building community, by increasing awareness about how widely-used data.table is among other R packages.
I think this will ultimately be a pretty low lift while allowing more public connections to the community.
other packages to consider: https://cran.r-project.org/package=maditr https://cran.r-project.org/package=getDTeval
glad to see some positive feedback to my proposal. also would be cool to have some logo with a sea lion giving a thumbs up, does anybody have graphics/art skills? @Maradestefanis ? My vision is that the README.md should have a one-line mention of the package, with a link to a blog post on https://rdatatable-community.github.io/The-Raft/ which gives further details. So that would entail a little extra work for the package author: writing that blog post. (But no extra work for data.table devs, who just review the PR with a change to README.md)
Hey @tdhock I'm stepping in and I can give the logo a shot. It would be great to have it in high definition, if possible. Can you provide that? Also, is there anything else you need from me for the blog?
Hi Mara the existing logo graphics files are in https://github.com/Rdatatable/data.table/tree/master/.graphics, is that high enough definition?
Yes, awesome! I've been trying it out these days
Mara Destefanis Lic. Comunicación Social. Máster Ciencia de Datos Tel: (+598) 99041531 Ln: https://www.linkedin.com/in/maradestefanis/ web: https://www.maradestefanis.com/
El mar, 2 abr 2024 a las 18:01, Toby Dylan Hocking (< @.***>) escribió:
Hi Mara the existing logo graphics files are in https://github.com/Rdatatable/data.table/tree/master/.graphics, is that high enough definition?
— Reply to this email directly, view it on GitHub https://github.com/Rdatatable/data.table/issues/5723#issuecomment-2033091087, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXVQW37LM56VBL3QZCXHDFTY3MMBZAVCNFSM6AAAAAA64XBCPOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZTGA4TCMBYG4 . You are receiving this because you were mentioned.Message ID: @.***>
Toby, Iam working on this graphics a sea lion giving a thumbs up, for now it is regular and not really nice the result. I will try again and send you the result in a few days.
I´ll keep pushing forward
Mara Destefanis Lic. Comunicación Social. Máster Ciencia de Datos Tel: (+598) 99041531 Ln: https://www.linkedin.com/in/maradestefanis/ web: https://www.maradestefanis.com/
El mié, 3 abr 2024 a las 6:51, Mara Destefanis @.***>) escribió:
Yes, awesome! I've been trying it out these days
Mara Destefanis Lic. Comunicación Social. Máster Ciencia de Datos Tel: (+598) 99041531 Ln: https://www.linkedin.com/in/maradestefanis/ web: https://www.maradestefanis.com/
El mar, 2 abr 2024 a las 18:01, Toby Dylan Hocking (< @.***>) escribió:
Hi Mara the existing logo graphics files are in https://github.com/Rdatatable/data.table/tree/master/.graphics, is that high enough definition?
— Reply to this email directly, view it on GitHub https://github.com/Rdatatable/data.table/issues/5723#issuecomment-2033091087, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXVQW37LM56VBL3QZCXHDFTY3MMBZAVCNFSM6AAAAAA64XBCPOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZTGA4TCMBYG4 . You are receiving this because you were mentioned.Message ID: @.***>
Hi all,
I'm reaching out with a couple ideas/options for the Seal of Approval process, to see if we can find one that everyone agrees on.
On this repository:
@jangorecki expressed some concern with a list on this repository's ReadMe, because it implies some kind of closed community, instead of data.table being accessible to anyone.
I propose that we simply add a Seal-of-Approval.md
file in this repo that contains a simple list of packages that have gotten approval. Then, we can link to this md at the bottom the ReadMe, in the Community section, and reserve all additional details for blog posts on the raft instead of them clogging up this repo.
Approval process:
At first, @TysonStanley had suggested that approval was initialized with a PR to this repo, and @MichaelChirico was wondering about the expectations from maintainers for reviewing.
I want to suggest a reverse-order:
This would be kind of a mini "journal-style" process that would maybe take some of the burden off the maintainers.
Longevity:
Michael also asked about whether this approval is granted in perpetuity or not. I think just workload wise, we wouldn't commit to periodic re-reviews. However, if someone were to alert us to an issue with a package - say, it's no longer actively maintained - we'd take it off the list at maintainer's discretion.
Type of SoA Packages:
I've come up with four types of packages that might merit approval; in principle, a submitter would have to justify the package falling in one or more of these categories. I'd love feedback if anything seems amiss:
[ ] An extension package: Adds to the internal functionality of data.table
[ ] An application package: Uses data.table
to accomplish a particular task or analysis.
[ ] A bridge package: Translates data.table
syntax to different syntax or provides helper functions for transitioning between data.table
and another object type.
[ ] A partner package: Not necessarily directly connected to data.table
, but deliberately follows the core philosophies of data.table
.
So, tl;dr, in this proposal:
Let me know if this sounds workable to you, or if you have other suggestions! :)
Since there are no major blocking concerns with Kelly's most recent proposal, I would suggest that we go ahead with that.
Since the seal of approval is moving forward with Kelly's suggestion, should we close this now?
With the goal of building a community of packages that have similar philosophies and syntax that are separate from data.table (and outside of data.table scope #5722), we would like to set up a “Seal of Approval” (play on the mascots of
data.table
) process. The process for a package receiving the Seal of Approval could be:Approval will include being listed as a Seal of Approval package on the data.table repository and an SVG of the “seal” that they can include on their own repository/package logo. The initial idea would be packages that do at least one of the following:
Possible examples of this could include packages that have few dependencies (e.g.
tinytest
), extend functionality (e.g.dtplyr
,tidytable
,tidyfast
), and packages that use data.table on the backend (e.g.modelsummary
). This process would hopefully help other developers feel more connected to data.table and be more likely to want to support it. Things for us to decide on are: