RConsortium / r-repositories-wg

RC Working Group on Repositories
37 stars 8 forks source link

Multiple ideas from the first meeting #3

Open llrs opened 3 years ago

llrs commented 3 years ago

Many thanks for yesterday meeting, as always with this working groups it was very helpful to attend one and I learned a lot.

What I understood less is what would FDA & pharma companies need of a repository or how would it work. My understanding is that the medical agencies need that pharma companies provide proof that the process is verified and meets their requirements. But although I work on the academic biomedical side I'm not sure what those requirements are, and I've seen another working group related to pharma and FDA though. Maybe there are some slides or content I could learn more about?

Some general reflections

Dreaming big, having in mind the next 20 years: As a user I want repositories to provide/signal (high) quality and to be easy to install from them (excellent with install.packages). As a package developer and maintainer I want clear, fair review, the fastest the better, and know about expectations of maintenance (and easy to use packages from them): No unexpected, not notified archival, hurries to fix things that are not announced, applying policy to packages already on the repository without announcement. Also would appreciate if policy is laid out with clear motives and explanations, I understand that the repository maintainers might need to take drastic decisions (when to have downtime due to maintenance, where to draw the line between cautious and openness) but some exchange and heads up on the policy changes would be very appreciated. Probably this is more about the culture of community and building taking into account who uses the system (not only the end user). As a reviewer I would also like to do just the bare minimum, but know what are the expectations or the review. I don’t like when I spend 4-5 hours on a package and later the authors decide to stop the submission process. So I understand that the CRAN team having seen that and spent thousands more hours than that to be tired of it, and I think that stopping reviewing for some time and coming back later (and not just the 2-3 annual weeks) will make this less tiring.

CRAN communication

At the meeting, it was mentioned that the CRAN team wanted some direct approach instead of talking as if they weren’t there: (Talk to CRAN instead of about CRAN). I think this could be clarified, how could any developer do it? Currently there are three emails related to CRAN, (besides their individual emails of each of the CRAN team members):

Some examples of my efforts trying to communicate with the CRAN team

Code of conduct

About the code of conduct (CoC), I think that having the same for submitters and reviewers, would be great. I didn’t understand why it was suggested that package developers and CRAN team would need different CoC. If they don’t share the same conduct I will expect more friction to appear. I think the useR! CoC for conferences (https://www.r-project.org/coc.html) could be modified and adapted to CRAN communications and exchanges. Who would be the conduct response team? The CRAN advocate and who else?

If by having a different code of conduct for CRAN team and package developers it was meant to provide some rules about the commitment package developers and CRAN team are willing to make (Package developer: I will make the package available at least for 5 years. CRAN members: I will review packages in less than 2 weeks). I think this would need to be a separate document from the code of conduct and might be harder to reach a consensus.

About package submission rate

There has been some discussion recently of a fall of package submissions or duration of packages on CRAN. First it would be worth exploring this, I think there is enough information outside CRAN to make some estimates and know if this is really true.

But if some other efforts from the R consortium and the R foundation work, we can assume more packages will be submitted at probably at a higher speed (better tools to make packages, better training, more outreach, more users,....).
If more packages are submitted, then more reviewers are needed. I don’t see any work around that. Even if many more were great at their first submission the share number of submissions still would mean that a high number of packages would need many hours of reviewers’ time.
If the reviewers were paid the increase of package submission rates could make that unsustainable and still several issues remain: homogeneity of the review, time spend on the review, training new reviewers...

If more reviewers (paid or not) are to be brought to CRAN (and other repositories) a similar approach to Bioconductor would be nice to scale things (this is also similar to what rOpenSci does). A new reviewer is paired with an experienced one for their first review, then if everything goes well the next review can be on their own.

I would even make that a bit harder and oversight the new reviewer for 2 packages. Not sure how the current paid reviewers are trained or ramped up. Perhaps this could be used or built upon that. However, CRAN review feedback is usually a lot more sparse than Bioconductor or rOpenSci reviews, so probably they do not need more training than them. (Which would also be an opportunity/pool of possible people interested on learning how to do it and to do it for CRAN)

About having other repositories

I think, the success of a repository such as CRAN and Bioconductor, is linked to the quality signal it provides, either technical or usefulness.

The suggestion of new repositories of different quality would need to bring some other to supplement that. If fewer checks are provided, i.e. no need to pass old release checks they should provide some other benefit, like faster submission to acceptance time or more submissions per month.

The suggested repository for FDA (and presumably other medical agencies: EMA, PMDA and others) would provide some clear usefulness for companies, but there might be a cultural clash (i.e. “If companies want to set up something, they already can! Why do they need something from us?”).

Some of the downsides of having multiple repositories, I thought so far, is how to signal quality on different repositories? How would package authors move their package between repositories? Currently I think one submits new version on Bioconductor and when accepted asks to archive due to moving to Bioconductor on CRAN. A similar process is needed for moving from Bioconductor to CRAN. This is due to the requirement of having unique valid names on CRAN and Bioconductor, if more repositories are set up will this requirement be extended on CRAN, Bioconductor and the new repositories to have unique names between all repositories?

Currently the interface of CRAN-Bioconductor is sensible but with more repositories it could end up a mess for the users. New repositories could have a different release cycle or rolling conditions like CRAN. Syncing and checking them present some challenges to reproducibility (but I haven't thought much about that) .

How do new repositories become official or gain quality/prestige? How would reverse dependencies work? (At the moment I think that CRAN installs Bioconductor packages differently than Bioconductor (cfr. https://stat.ethz.ch/pipermail/r-package-devel/2021q2/006977.html ) , so testing packages with Bioconductor dependencies is harder. How would that happen with CRAN and Bioconductor and other (new) repositories? Could other packages on CRAN depend on these repositories (via Additional_repositories ?) Have at least a roadmap of when new repositories would be accepted by CRAN. (f.ex. When 100 packages on CRAN point to them?).

Some ideas: Set up a new repository with 0 manual oversight. At most just R-release and R-devel checks (but no reverse dependencies). This would be more in line with how other languages handle their extensions and perhaps the initial expectations of authors. Computationally less expensive but might be enough to fulfill the desire of some package developers to have something it could work with hypothetically install.packages(abc, repo=”experimental”). From this repository new packages could be “upgraded” to CRAN.

Clarify expectations for maintainers on the long term (not just the checks) but explicitly say: hey do you want to maintain this package for 5 years? People only know CRAN to distribute packages, so that puts a lot of pressure on being there. Perhaps more talks about what does it mean to be in a repository and be a package maintainer? People might not be aware of the commitment and time required to fix bugs, follow R changes, be aware of upstream changes...

Some other ideas

Transparent review? I expect a cultural clash, but maybe when a new message is sent to reviewers or they upload a new one (enforce the rule that the package must upgrade the version number during the review). So that time waiting by CRAN and time the reviewer is waiting is more clear from the outside? And check if this needs to be addressed? I doubt it, I expect that what it usually takes more time is to fix and address the points that the CRAN’s reviews raises.

Having a dedicated space for when the CRAN team takes holidays or need to do some task. Having a fixed site could make it easier announcing to the broader community and finding it later to understand the effect of this on the repository (accumulation of packages?, see how many time does CRAN need to fix things...).

A long shot: Volunteering efforts for repository maintainers: Set a way to learn the ropes, for CRAN and other repositories. How to step in? How to step out? What does it takes to be in a team maintaining a repository? I think providing a path for this role would help on the long goal. That said I also think that maintaining machines should not be done by a repository maintainer (or CRAN team), their volunteer time is more valuable reviewing, discussing and taking decisions based on their vast experience managing the submissions.

Currently there is also some interest on other communities to join and share experiences and learn from different communities about how they deal and solve managing packages. There will be a conference about package managers. It would be great to have someone there: https://packaging-con.org/ (I have some other compromises those days and won’t be able to attend it).


Maybe having different issues for different ideas would be better, but feel free anyone to comment on. I might have misunderstood something or assumed something that's not true.

llrs commented 3 years ago

If these repositories are meant to be in production, it might be important to think also on the security of the repositories. There have been reports of dependencies injection on pypi and npm (and probably others). Specially for FDA and other regulatory agencies this will be relevant.