bluefoxr / COINr

COINr
https://bluefoxr.github.io/COINr/
Other
22 stars 7 forks source link

paper.md explain why the "coin" and "purse" objects are necessary #14

Closed paulrougieux closed 1 year ago

paulrougieux commented 1 year ago

Related to the JOSS review https://github.com/openjournals/joss-reviews/issues/4567

The line number refers to the version of paper.md on July 20 2022 at commit 1d9dce183.

Line 54

"COINr wraps all composite indicator data, analysis and methodological choices into a single S3 class object called a “coin”. This enables a neat and structured environment, simplifies the syntax of functions ..." COINr also supports time-indexed data, represented by the "purse" class (a time-indexed collection of coins)."

Can you provide more details on the structure of those objects? What are the advantage of the coin and purse objects over standard data frames, potentially with list columns (à la dplyr, tidyr, purrr) as described by Hadley Wickham in many models?

Is this new coin structure required because of the nature of the data?

bluefoxr commented 1 year ago

@paulrougieux the coin class is defined which is a full representation of a composite indicator - see here: https://bluefoxr.github.io/COINr/articles/coins.html#what-is-a-coin

It includes a lot more than just the data, including also the index structure (a hierarchical structure), sets of weights, methodological specifications (which allow coins to be regenerated), metadata for both units and indicators, and some frequently used parameters which save processing time in functions. This is a lot more than can be wrapped in a data frame, and is not rectangular data.

To see the contents of a coin you can call its print() method:

library(COINr)
# build example coin
ASEM <- build_example_coin()
ASEM

for a more detailed exploration try e.g.:

str(ASEM, max.level = 2)

The "purse" class is indeed a data frame with a list column in which each entry is a coin. So it is a way of indexing many coins by time.

There are several reasons for defining these classes. First that building functions (e.g. Treat() and others) have methods for numeric vectors and data frames as well as coins and purses. So using method dispatch seems a logical way to keep these functions flexible for users.

Then as mentioned, they keep all the data and metadata in a neat order, and allow copies and regeneration, which would otherwise not be possible (or at least far messier): https://bluefoxr.github.io/COINr/articles/adjustments.html

bluefoxr commented 1 year ago

@paulrougieux do you think some further explanation is necessary in the paper? I was trying to keep it brief as per the guidelines.

paulrougieux commented 1 year ago

Since coins are the central objects of the package, it is important to briefly describe how they are structured. Based on your message:

"It includes [...] the index structure (a hierarchical structure), sets of weights, methodological specifications (which allow coins to be regenerated), metadata for both units and indicators [...]."

The paper could explain the nature of coins in a succinct way and refer to the following page for the full description https://bluefoxr.github.io/COINr/articles/coins.html#what-is-a-coin

bluefoxr commented 1 year ago

Yes ok then I will add this to the paper. Thanks

bluefoxr commented 1 year ago

@paulrougieux see commit https://github.com/bluefoxr/COINr/commit/ded8997f83b55628906dfdcd85771d9ebad9552c

Please let me know if the edits sufficiently address the issue. Thanks,

paulrougieux commented 1 year ago

The update

"A coin is a structured list including:

  • Indicator data sets for each processing step (e.g. imputation, normalisation, etc.)
  • Metadata pertaining to indicators and units (e.g. names and weights, but also the hierarchical structure of the index)
  • A record of the COINr functions applied in constructing the coin"

provides good details on how the coin objects are structured. OK from my side.

bluefoxr commented 1 year ago

Ok thanks