floswald / CASDr

R tools for work on CASD
https://floswald.github.io/CASDr/
1 stars 0 forks source link

project scope - shouldn't we have mainly better documentation? #1

Open gustavekenedi opened 3 years ago

gustavekenedi commented 3 years ago

Hi Florian,

This seems like a very good idea but I wonder whether the tasks between research projects are too varied for this to be really useful. My personal experience is that it would be much more useful to have a place which has helpful guidance on each dataset (something you'd actually already tried to set up); some of the documentation can be quite bad. Code-wise what I've found most useful is purely R-related and not so much to CASD or the specific data (e.g. loading datasets using fread rather read.csv, learning how to use get(), parallelisation, etc.).

Just my 2 cents :)

floswald commented 3 years ago

:wave: thanks for the input!

couple of thoughts in reply:

Example?

  1. I just worked with the ENL. for the 1970 wave there is pdf that says the documentation is included in FIGARO. What. The. Heck. is. FIGARO.
  2. First wave with included docs is 1984. ok, fine. then i have 1988, 1992, 1992, 2002, 2006 and 2013.
  3. Some variable names persist over waves, others disappear, other are added.
  4. Oh, waves 1992, 1992, 2002 do not have residence at commune level. Whoops.
  5. Almost each wave has a different SAS file layout. sometimes you get 1 var out of men84.sas7bat sometimes out of memcomp.sas7bat
  6. The resulting function I wrote to read those things from SAS is 100 lines of code.

I could go on for hours. everybody who wants to compute even only an average from this data has to go through this list. I want to share my 100 lines of code with everyone who wants to have them such they don't need to go through this. Also, they can all check whether I made a mistake and improve the code. I know that I would have cried tears of joy, had I had access to this list of 6 points above when I started out with this. I would have run down the hallway screaming had I gotten the 100 lines of code. But maybe that's just me :smile:

of course this will be useful for people to different extents, as is the case with any open source project.

gustavekenedi commented 3 years ago

Haha what a dreadful example! I agree that there definitely would be benefits to sharing info about datasets and relevant code learned by hours of trying to understand the data (and then actually implementing the code). There's been talk to set up some sort of forum (e.g. Slack) among Échantillon Démographique Permanent (EDP) users but haven't seen it yet. I'll think about what I can potentially contribute here over the summer :)