animint / animint2

Animated interactive grammar of graphics
https://animint.github.io/animint2/
60 stars 19 forks source link

Where do animint2's datasets come from, and where are their codebooks? #100

Open ampurr opened 1 year ago

ampurr commented 1 year ago

This is vaguely related to issue #97. I'm trying to generate a very simple example for the basic usage section and decided to use a default dataset. I noticed that animint2 contains a lot of datasets—33 by my count. Some of them come from ggplot2. But I'm not sure where the rest are from.

Where are they from? (For example, where is the WorldBank dataset from?) And where can I find their corresponding codebooks?

As always, no rush in responding. Thanks in advance. 🐈

tdhock commented 1 year ago

Where do the data sets come from? It should be documented on the man page, under "sources" otherwise I don't know. Codebooks? I don't know what you mean, but maybe I could help create one if you clarify?

ampurr commented 1 year ago

Got it. I've spotted the "Source" subsection in the manual—thanks! :>

You've probably written codebooks before. That word's just jargon for metadata about the datasets. Codebooks usually describe the dataset's variables and how the data were collected. They're great for reproducibility, since variable names themselves are usually insufficient for describing the data.

The diamonds dataset has one. WorldBank and montreal.bikes don't. After some of the other website stuff is set up, I'd be down for writing codebooks together. It'd have to be together for at least some of the datasets, since I don't know the data for e.g. montreal.bikes and you do.

Or you could just write them yourself. Up to you, obviously. I'm not your boss. 🐈🐈🐈

EDIT: Corrected lots of typos.

ampurr commented 1 year ago

I looked it up. "Codebook" is social sciences jargon. Sorry about that! I didn't realize the term wasn't universal in science.

tdhock commented 1 year ago

sure, please open a PR with some edits to the man pages, please put TODO where you think I should add some info.

ampurr commented 1 year ago

Sure thing. :>

ampurr commented 1 year ago

Status update: At least one of the datasets has its source in the comments, which hopefully means that's the case for all of them. The dataset is animint2/data-raw/economics.R, and the source can be found here.

Note to self: Datasets can be found in animint2/data-raw.

tdhock commented 1 year ago

hi again, if this is still an issue, can you please link a PR with the TODOs? Otherwise, can you please close?

ampurr commented 1 year ago

No problem. I've been preoccupied with the reference website, hence the delay. Unless you want me to prioritize this, I'll do it after I throw the website online. To-do for me:

ampurr commented 1 year ago

Okay, website has been thrown online. Do this now, @ampurr. 🐈

ampurr commented 1 year ago

Everything checked has a source attached (and therefore I won't need to attach a TODO to it):

ampurr commented 1 year ago

Everything checked has a codebook attached (and therefore I won't need to attach a codebook TODO to it):

ampurr commented 1 year ago

Note to self: Not all .Rd files are generated by roxygen2. Some files were manually created.

ampurr commented 1 year ago

Adding TODOs—a progress report:

tdhock commented 1 year ago

thanks this is useful, I will look at that PR and edit when I get a chance.

ampurr commented 1 year ago

Thank you! No rush. :>

tdhock commented 10 months ago

hi @ampurr for another project I have sas codebooks defined as below

K2Q01_D in (1,2) then TeethCond_21 = 1;
if K2Q01_D = 3 then TeethCond_21 = 2;
if K2Q01_D in (4,5) then TeethCond_21 = 3;
if K2Q01_D = .M then TeethCond_21 = .M;
if K2Q01_D = 6 then TeethCond_21 = .L;
if SC_AGE_YEARS

do you know if there is any existing package to parse such sas codebook data into R? I did a web search but did not find anything obvious.

ampurr commented 10 months ago

Hey, @tdhock. :>

Unfortunately, my department never used SAS, so I don't have any special insight into your problem. Looking it up...

If you just need to parse the output of a SAS program into something R can read, the haven package has a read_sas() function.

The SASmarkdown package will let you use SAS code with R Markdown.

A possible wacky chain solution:

  1. The SASPy Python package says that it lets you "exchange values between python variables and SAS macro variables," which seems promising.
  2. The reticulate R package lets you translate between R and Python objects.
  3. You might be able to use these two packages in conjunction.

Hope this helps. Good luck with your project. 🐈