Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.58k stars 977 forks source link

%fin% operator wrapping %in% and %chin% #5232

Open ben-schwen opened 2 years ago

ben-schwen commented 2 years ago

I would like to have a convenient operator for this behavior

"%fin%" = function(x, table) if (is.character(x) && is.character(table)) x %chin% table else x %in% table

hence, choosing %chin% for characters and %in% otherwise. Current open PR for %notin% has this behavior for the negation.

MichaelChirico commented 2 years ago

agree it's somewhat toilsome to keep track of which to use depending on the column type.

maybe the first step is to design a functional version like we have for the %like%-alike operators: like(), and have that dispatch to common cases (character/non, negation, etc)

HughParsonage commented 2 years ago

%fin% is currently exported by package fastmatch (and might in fact already satisfy the proposed use-case)

Nj221102 commented 3 months ago

Do we still want this operator ? if yes then i will like to work on this. if not then how about closing this issue. WDYT @MichaelChirico @ben-schwen @HughParsonage

MichaelChirico commented 3 months ago

{heims} also exports %fin% (but doesn't have any revdeps):

https://github.com/cran/heims/blob/a00900357fc98b416368dfbc11f172dc62f61c3f/NAMESPACE#L3

Current open PR for %notin% has this behavior for the negation.

Indeed it's weird to have inconsistency like now...

%fin% is currently exported by package fastmatch (and might in fact already satisfy the proposed use-case)

But OTOH not great to cause a conflict, {fastmatch} has a fair number of revdeps, including 11 overlapped with {data.table}:

revdeps = \(pkg) tools::dependsOnPkgs(pkg, c("Depends", "Imports"), recursive=FALSE, installed=available.packages())

writeLines(toString(intersect(revdeps("fastmatch"), revdeps("data.table"))))
# DysPIA, fy, grattan, healthyAddress, heims, hutils, LilRhino, networkR, Signac, TeXCheckR, webtrackR

@ben-schwen what are your current thoughts here?

ben-schwen commented 3 months ago

fy, grattan, healthyAddress, heims, hutils, TeXCheckR are all from @HughParsonage which would leave us with DysPIA, LilRhino, networkR, Signac and webtrackR

I also just checked the overall usage of %chin% to get an overview of the potential user base https://github.com/search?q=org%3Acran%20%25chin%25&type=code

Currently, I would slightly tend towards not implementing it, since user interest does not seem high enough. But for an informed decision we would also need benchmarks with fastmatch::%fin% on character vectors