bnowok / synthpop

Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control
40 stars 8 forks source link

Standardise package documentation #31

Open florianm opened 11 months ago

florianm commented 11 months ago

I'm really enjoying synthpop and would love to contribute to lowering the entry barrier for users as well as contributors. There is some really great tooling available to make the development and contribution workflow reproducible and gently enforce some good practices. This would make contributing easier and significantly lower the effort of the package maintainers to get contributions to an acceptable quality.

@bnowok @cjvanlissa @gillian-raab Would you accept a pull request addressing the following suggestions:

Related ideas, but out of scope (unless preferred):

Suggestions which can be actioned through GitHub repo settings by owners (not me):

Out of scope:

LotteVanUtrecht commented 11 months ago

Not one of the developers, but would support a pull request to make contributions easier.

We have found the website in genetal and the publications at https://www.synthpop.org.uk/resources.html specifcally to be very useful. Not too familiar with pkgdown sites personally, how would this function in relation to the website?

Best, Lotte

florianm commented 11 months ago

Hi @LotteVanUtrecht! Great to hear from you.

The synthpop website is indeed a fantastic resource. Not every R package has such a well curated website and such a plethora of research and publications behind it.

pkgdown makes it very easy to generate a standardised package website and host it for free on GitHub pages. Many R packages have used this far easier pathway, and many R users will expect such a resource to exist.

My value proposition is less in replicating a website (which synthpop already has) and more in the affiliated tooling and infrastructure uplift. Adding tests, roxygen, GH actions, Codespaces, and the like. E.g., rOpenSci publishes many packages related to scientific programming which all have the same infrastructure standards.

To study feasibility of this suggestion, I have created a quick proof of concept which generates a rudimentary pkgdown website. Getting the latex toolchain installed correctly in the Codespaces devcontainer was the trickiest bit. Caveat: In that experimental branch I have also added linters and auto-formatters, therefore that PR might look frighteningly intrusive.

I would imagine a PR with the minimal scope as per above first could be a good first step, should the package authors indicate their endorsement.

Edit: While we're awaiting Gillian's early 2024 contributions, here's a separate experimental PR with just the non-invasive changes. This PR does not touch R code, manual, or vignettes and therefore should not conflict with Gillian's new contributions. This PR is the most realistic preview I can offer to determine the impact of the possible changes.

cjvanlissa commented 11 months ago

I also think this has a lot of added value; a github pages site is more FAIR, and tests, continuous integration, and proper documentation are just essential. Support!

gillian-raab commented 11 months ago

Dear Florian and Caspar and Lotte, Many thanks for your kind words and suggestions. I have had a chat with Beata about this today.

We are in the middle of sorting out what the future of the synthpop package will be in terms of maintenance, looking after the web site etc. ALso I am in the middle of mounting a new version of the package on github that will include functions to measure disclosure risk (Lotte you are on the list to be among first people to try this out). Also in the longer term we are hoping to negotiate with another group to take over the maintenance of synthpop. What you suggest about documentation sounds very useful and I hope we can take advantage of your expertise with this, but we want to really understand what this involves. We will be back to you soon in the New Year.

Thanks very much and have a happy festive season.

Gilli\n

Gillian M Raab Research Fellow (part-time) Scottish Centre for Administrative Data Research My core working days are Tuesdays and Thursdays Though I sometimes swap them for other days 07748 678 551


From: Florian Mayer @.> Sent: 14 December 2023 02:49 To: bnowok/synthpop @.> Cc: Gillian Raab @.>; Mention @.> Subject: Re: [bnowok/synthpop] Standardise package documentation (Issue #31)

This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe.

Hi @LotteVanUtrechthttps://github.com/LotteVanUtrecht! Great to hear from you.

The synthpop websitehttps://synthpop.org.uk/ is indeed a fantastic resource. Not every R package has such a well curated website and such a plethora of research and publications behind it.

pkgdown makes it very easy to generate a standardised package website and host it for free on GitHub pages. Many R packages have used this far easier pathway, and many R users will expect such a resource to exist.

My value proposition is less in replicating a website (which synthpop already has) and more in the affiliated tooling and infrastructure uplift. Adding tests, roxygen, GH actions, Codespaces, and the like. E.g., rOpenScihttps://ropensci.org/ publishes many packages related to scientific programming which all have the same infrastructure standards.

To study feasibility of this suggestion, I have created a quick proof of concepthttps://github.com/florianm/synthpop/pull/1 which generates a rudimentary pkgdown websitehttps://florianm.github.io/synthpop/. Caveat: In the safety of this branch I also have added linters and auto-formatters, therefore might look frighteningly intrusive.

I would imagine a PR with the minimal scope as per above first could be a good first step, should the package authors indicate their endorsement.

— Reply to this email directly, view it on GitHubhttps://github.com/bnowok/synthpop/issues/31#issuecomment-1855032216, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE3HB7EWDBXZ5XHDIYMBUMDYJJSNPAVCNFSM6AAAAABAHGQ5TOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJVGAZTEMRRGY. You are receiving this because you were mentioned.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

florianm commented 11 months ago

Hello @gillian-raab, Thanks for your kind reply. Exciting to hear that synthpop is under active development and maintenance!

Considering that you're working on the package right now, it would definitely make most sense to merge your work first before addressing any of the suggestions above.

As a next step, full test coverage could be a valuable contribution in itself. Having tests in place makes it easier for both new additions and maintenance changes (with no intended functionality change) to prove they didn't break anything by accident. To give you an example, here's a test covering codebook.syn and here's the result of the GitHub action automatically running R CMD Check.

It's also exciting to hear about disclosure risk, something my colleagues (https://wa.gov.au/peoplewa/) are actively working on too. They currently use OptimShare and piflib but would love to have this functionality available in synthpop.

Looking forward to hearing from you in the new year, and happy festive days too!

Edit: a preview of non-conflicting changes

florianm commented 6 months ago

@gillian-raab @bnowok apologies for the pushy "at" mention - just checking in how the package is progressing and whether you're interested in continuing with this conversation?