Open bgruening opened 5 years ago
I guess the lowest hanging fruit is to support the continuation of the bot work that @epruesse has been spearheading. We'd need to outline some tangible goals for that, of course. In hindsight, maybe we should have submitted a GSoC proposal and had some masters students working on this already.
The main issue we've had is wanting access to something new in the bleeding-edge Bioconductor, but Bioconda lagging behind for very good reasons related to labour-intensive update processes. I feel like there could be some pitch around additional support for inclusion of cutting-edge software, reducing the lag between e.g. R/Bioconductor releases and Bioconda updates etc. As @dpryan79 suggested this might just involve work on tooling and automation.
I also feel like there could be some tools for recipe creation to encourage submissions.
I'm not sure how labor intensive bioconductor releases are, it's mostly just hitting the "rerun" butten every 5 hours on circleci :) There does need to be some additional retooling done (also on the bioconductor side) to better automate system requirements. I'm trying to keep track of that stuff for the most recent release, which is as done as it can be until R 3.6.1 is released and conda-forge migrates. Then again, given how annoying it is to get R to build properly in conda-forge, maybe we could pay someone for their time :)
Supporting a custom beta
label that then doesn't have a main
label would be interesting. Then people could build things like monocle3 and the beta versions wouldn't pollute main
. Presumably no docker containers would get built in that case (or maybe the nextflow folks would want them?).
Sorry @dpryan79 , showing my ignorance of how things work, obviously I've been more a user than a developer here ;-).
Integration with workflow tools already works pretty well, I use the Nextflow integration all the time (though not currently with containers), but maybe that's another angle to be worked if people can think of improvements.
Paolo was here last week and I asked him about the possibility of Conda -> container resolution, so that someone could e.g. specify a conda package and just a small additional thing to get the associated biocontainer, rather than having to manually specify that container. Is that something others would find useful?
I think that the conda -> container resolution within applications is something particular to each workflow environment, not sure if that is something that bioconda funded hours should be fuelling or diverted to (and this does happen on other workflow environments already, such as Galaxy, I think it is more a matter of the Nextflow community to be able to sort this out).
Even though work might be just pressing buttons every some hours, still the fact that people need to dedicate time to maintaining Bioconda at the different levels is something that should be put in a grant, as maintenance of resources is more and more recognised as important by funders (albeit slowly). Also I'm sure that there must be occasions in which things go south and the amount of time that @dpryan79 and others spend is probably not negligible. Lets consider as well that CZI for instance is not a typical funder (in the sense that they won't be only pursuing scientific novelty), they are more engineering-involved and they might even be willing to put some of their engineer's time in helping to maintain bioconda. For HCA they put both funding and engineering time from their staff.
What about applying as well for small or special grants from software carpentries or asking for large institutes that might recognise the value of bioconda to chip in with labour time, as large organisations and companies do for instance with large open source projects of their interest, where they commit an FTE or fraction of it for a year. What about Elixir? this seems pretty much like relevant research infrastructure to me. Or pre-competitive setups in industry like the Pistoia Alliance (not sure if this is still alive - but I'm sure there must be equivalents).
What about maybe arguing that bioconda needs storage space for container storage in distributed geographical locations, specially close to areas of the world with less speed in terms of connectivity, and to protect bioconda of a potential shut down or denial of service from quay.io? Quay.io is currently the only storage of containers that we have, right?
Quay.io is currently the only storage of containers that we have, right?
No, Docker containers have a backup at EBI and Singularity container have a backup in 5+ places. But in general the more backup the better :)
Paolo was here last week and I asked him about the possibility of Conda -> container resolution, so that someone could e.g. specify a conda package and just a small additional thing to get the associated biocontainer, rather than having to manually specify that container. Is that something others would find useful?
This works already. Biocontainers are designed to have this match. Galaxy does support it and CWL does support it. I talked to @pditommaso recently and its just a matter of implementing it in Nextflow.
What about Elixir? this seems pretty much like relevant research infrastructure to me.
Please note that ELIXIR is already funding Biocontainers and has funded people to work on Bioconda recipes in the past.
The reason we can not support bleeding edge bioconductor in this release cycle is because it depends on a new R version, which is too cutting edge for the conda community atm. So we wait for the first point release. This is depatable but the decision was taken because the .0 release has often some major bugs. The real solution to this problem is to get the point releases of R to be ABI compatible. So we do not need to rebuild all packages against every point release. Like we do for python. We build against Python 3.6 but not against 3.6.1 and so on.
In my opionion we should not push for faster, bigger, higher releases, we should concentrate on quality. Software needs time and if people need bleeding edge technologies or beta software they could use a separate channel, or we can spin something up like bioconda-test or something. In my view conda and bioconda are for production systems. Easy deployable, trasnferable, reproducible. People always can install from upstream/master if they need bleeding edge or beta packages.
Here is my small todo list, maybe its useful:
Just a quick idea: as the call includes outreach and community engagement work, what about developing something like a Software Carpentry lesson that teaches (bio)conda basics. So teach how to set up and use bioconda, but then also how to contribute.
Even though the Contribution Guide is good already, I think this is often times still a very daunting step to take for many. If there was a half-day or full-day lesson for on-boarding people to that process, taking/assisting that step could be scaled more easily. Also, as a nice side-effect, such lesson development would probably stress-test the Contribution Guide for weaknesses...
One thing I could suggest for Bioconda is improving reproducibility (even further).
It is not rare that creating an environment fails when using a definition file (either from conda list --explicit
, or conda env export
), or reverting a transaction. This is somehow linked to packages that are removed from the repo or labelled as broken or legacy (or hot-fixed? not sure there).
This breaks all the purpose of creating an environment definition file as a backup, or to transfer an environment for reproducibility.
I know the doc mentions ways to get missing packages by looking to legacy or conda-forge. But the concern here is that most users will never do (or even know) that.
Bioconda is already doing a lot (together with conda itself, and conda-forge) through the new compilers migration and all, but could there be a way to further improve on this? Having a way to guarantee that using a definition file to recreate the exact same old environment, stable along time, would be really great.
I mean addressing the reproducibility crisis in Science could be a serious point to get this Chan-Zuckerberg grant, don't you think?
Another thing, as pointed by @dlaehnemann would be to extend communication with a MOOC or something similar, or even support the development of local users communities. Goes together with @bgruening 's codefest.
Also, I like the suggestion of @dpryan79 regarding a beta
channel, or @bgruening 's bioconda-test
. Although I have a preference for something like lab
clearly oriented toward maturation of recipes, or experimental packages.
Not mentioning the rest of your list @bgruening, all points are relevant.
This is not necessarily funding for Bioconda, but a fellowship for someone affiliated with a U.S.-based institution that is interested in promoting scientific software, https://bssw.io/pages/bssw-fellowship-program. The description says that "Each 2020 BSSw Fellow will receive up to $25,000 for an activity that promotes better scientific software. Activities can include organizing a workshop, preparing a tutorial, or creating content to engage the scientific software community."
Looking at the previous fellows, one can organize workshops to teach researchers how to create conda recipes, and how to package and distribute software with conda. The deadline for applying is October 15, 2019.
Hi,
@pinin4fjords pointed us to: https://chanzuckerberg.com/rfa/essential-open-source-software-for-science
Sounds like the Bioconda community could apply and get some funding for interesting projects.
Please use this issue to sketch out some ideas. How can we make Bioconda better? :)