Closed antaldaniel closed 2 years ago
Hello Daniel,
Thanks for the question, to be honest this is the first time I hear about your package. Took a quick look, you seem to have invested a lot of effort and solve real problems.
I did notice, however, that you rely a lot on dplyr and haven packages (with the very useful haven_labelled_spss class) and I believe it is important to point to a significant shift I have recently made in a direction (not opposite but) parallel to those packages.
Despite the extremely high quality of dplyr and haven, they are both tightly integrated in the tidyverse which means a huge dependency chain I would like to steer away from. I admire each individual package, but I prefer a very low to (if possible) zero such dependencies in my own packages.
Another reason for which I took a different path was the design philosophy of packages haven and labelled, regarding the treatment of the missing values. This fact alone drove me to build a dedicated package called declared, and to understand the differences I would invite you to read its vignette.
Yes, our issues are similar but the approaches seem (for the moment) rather incompatible. Package DDIwR now uses the declared class as its main conversion object. It is however easy to make a switch, should you wish so (and I would offer my services to make that transition).
Otherwise, the (main) goal of package DDIwR is to implement the DDI - Data Documentation Standard in R, creating study codebooks being a secondary objective. But this is something trailing on my to do list, I am just trying to find the right time and shape for such an automated codebook creation.
Best, Adrian
Just an FYI that we've created a package recodeflow that has some of the functions of reharmonize but recodeflow focuses on recoding and harmonizing many variables by defining the transformation rules in a CSV file.
We like and use DDIwR. We've been meaning to incorporate DDI using DDIwR. recodeflow currently exports to PMML.
Nice package, Doug, and yes there are indeed many possible collaborations between such R packages.
@DougManuel , I think that the recodeflow, if matures, could be a very good alternative to our labelled_spss_survey inherited from labelled and haven.
The retroharmonize goes into the direction of DDIwR - we also produce standard codebooks and metadata for immediate publication on the European open science repository Zenodo or the U.S. Dataverse. The new, not yet released version of retroharmonize also allows the transformation rules to be set in csv file. [Working with a Crosswalk Table]https://retroharmonize.dataobservatory.eu/articles/crosswalk.html)
Our aim is to create final research products that are described with standard statistical metadta, and can get immediately (in an R workflow) a doi, appropriate descriptive metadata, etc.
As for the tidyverse inheritence, I think that it is not an important issue for us. Survey harmonization is a niche area, and it is a very laborious taks anyway. Whilst I agree that packages should have as little weight as possible, in this case, I hardly imagine that the users will not load tidyverse anyways. And of course, under the hood the more important dependency is rlang and vctrs, which I think are becoming standard, modern R.
I am not sure if it applies in this case (did not study these packages in depth), but it seems somewhat related to the SDTL by DDI Alliance. Perhaps it helps, or gives additional ideas: https://ddialliance.org/products/sdtl/1.0
@dusadrian I think that I have a less ambitious goal before you released your package (mainly focusing on R / SPSS interoperability). If DDIwR will mature and have more documentation, I think it could be a good basis for our package as a dependency.
Our package creates harmonized outputs from surveys, and we must produce for various research consortia harmonized datasets on scale. The STDL is a very interesting proposition, but at this point extremely general and do not seem to be supported yet in R. It is an overkill, I think, it most cases, and for me a simple crosswalk schema in a tabular format is enough for our purposes.
Our package has two s3 classes, the labelled_spss_survey which is derived from labelled and haven, and survey, which is a tibble with extensive metadata for later harmonization. Your declared class, if it was well documented, could potentially replace our labelled_spss_survey class, because it is more general, and it is neatly coded.
We are going to write several publications now about and with retroharmonize, but as DDIwR is not fully documented and not released yet, I do not see yet how we could build on it.
Hi @antaldaniel, yes I mainly focused on developing code and now catching up with the documentation. What would you like to see in the documentation, that is not clear for the moment?
I am interested in the idea of the declared class - how it works, what methods it has (will have), what helper functions / constructiors. Something like this Introduction to labelled or this labelled_spss_survey.. Reading the code from source file is a bit difficult for the imagination.
Oh, I understand now.
The point is, the class declared is independent of package DDIwR, it is described in the separate package declared
, which actually does have a dedicated explanatory vignette:
https://cran.r-project.org/web/packages/declared/vignettes/declared.pdf
Is this what you are looking for?
Hi, I think that we are working on very similar issues, and solving similar problems. Do you think somehow we could harmonize the retroharmonize package and DDIwR?