RConsortium / wishlist

A wishlist of idea from the ISC and community
29 stars 0 forks source link

Package tagging/categorization #14

Open leeper opened 7 years ago

leeper commented 7 years ago

The Task View system offers one fully manual way of organizing and recommending packages. But it can become unwieldy, particularly in rapidly changing subject areas and/or in those where there are a considerable number of packages (e.g., web services). It would be great to have packages "tagged" in the DESCRIPTION file into a standardized set of reference categories.

Apparently this is already allowed by the Classification field, although I don't think it's widely used. This proposal would basically involve three parts:

  1. Generate a set of allowed tags and submit documentation patches to R-core
  2. Develop a package that could filter packages based upon tags and automatically construct task view-like documents from them
  3. Encourage users of popular packages to use the tags (basically through a bunch of PRs to those packages)
seasmith commented 7 years ago

Need at least one tag (or new view) along the line of data-management. dplyr - the most directly downloaded CRAN package - is only mentioned in Natural Language Processing and Official Statistics but not listed as an official ctv package or core package.

jimhester commented 7 years ago

For a practical implementation of this idea in R see Bioconductor's BiocViews which are annotated in the DESCRIPTION file as biocViews.

gaborcsardi commented 7 years ago

How about just allowing free-form keywords in DESCRIPTION? Then we would index them, put them in a database and serve them in the "Great R API".

leeper commented 7 years ago

I think a controlled vocabulary is better because it prevents typographical errors, pluralization, spelling differences, etc. from affecting how packages end up being organized.

biocView approach looks good!

gaborcsardi commented 7 years ago

Typos will be fixed, pluralization and spelling differences are ignored by any decent search engine.

What should people do if their package does not fit into the controlled vocabulary? Also, somebody needs to create and maintain the vocabulary.