Closed johanneskoester closed 5 years ago
Johannes, This is a great idea! I often felt the need of an OS-independent package catalogue for computational biology that works within the user space. After few minutes of trial with Conda, I could agree that it's a wonderful platform for such system. I would like to contribute packages that I use as one in the team. I have an experience as a package maintainer (so-called ports committer) in the FreeBSD operating system for ~5 years in the early 2000s.
Great! I have send you an invitation.
Hi Johannes - I'd like like help with this, too. I like the idea of having one obvious place to find and contribute bio-related conda packages. I already have a handful of conda packages that could be useful here (https://conda.binstar.org/daler).
Hi Ryan! Great to hear that! I have sent you an invitation. Feel free to add your packages!
Btw Hyeshik, please also create an anaconda.org account so I can add you to the bioconda team over there, as well.
FYI, https://github.com/chapmanb/bcbio-conda has a lot of conda packages already. I'm not sure how much of that is tied specifically to versions needed in the bcbio-* packages though.
Good to know! Maybe we can invite Brad to migrate their stuff once we have reached a critical mass with Bioconda.
@johanneskoester Thank you! I just created an account on anaconda.org.
Great. Added you!
@johanneskoester I seems that I don't have a permission to write the recipes github repo. Can I get one? Thank you!
Of course. Sorry for that, does it work now?
It works. Thank you!
@johanneskoester, I am enthusiastic about this idea as well, especially since I've been using wonky Makefiles to deal with software environments for my company =) I'd love to contribute in any way possible, although I don't have much experience with conda apart from being an end-user. My anaconda.org account is dkoppstein.
Hi, glad to hear that you want to join us, thanks! Conda packaging is really easy. Basically, it is just some metadata plus a shell script with the commands you would use if you install a tool manually.
Johannes; This is a great initiative, thank you for putting it together. I'd love to contribute as well. We have a lot of packages prepared for bcbio dependencies and could happily move over to more community driven packaging:
It may also be worth getting in touch with the CGAT folks, who have a wide variety of conda tools as well:
What Linux platform do you target builds for? In bcbio we need to support people running on older platforms so build everything in a CentOS5 docker container. We have a ready to run script that does this, only re-building new recipes that are missing from anaconda.org:
https://github.com/chapmanb/bcbio-conda#readme
Thanks again for organizing this.
Hi Brad, I'm very glad you want to join. We already thought about contacting you. Great work with the bcbio. So, feel free to move stuff over! Regarding the builds, that is a very good question. So far, my impression was that if we target e.g. linux-64 and include all dependencies for build and run as conda packages, the resulting builds should be independent of the underlying linux platform. E.g. conda is shipping its own libstdc++ etc, right? Am I missing something here?
Best, Johannes
Johannes; Thanks so much for including me. Regarding builds, unfortunately it is not that isolated. You'll compile against system packages which will cause failures on different systems. glibc is the most common cause of these. For instance, the current bedtools build fails to run on CentOS6:
~/test/anaconda/bin/bedtools
/cm/shared/apps/bcbio/20141204-devel/data/anaconda/bin/bedtools: /lib64/libc.so.6: version `GLIBC_2.15' not found (required by /cm/shared/apps/bcbio/20141204-devel/data/anaconda/bin/bedtools)
/cm/shared/apps/bcbio/20141204-devel/data/anaconda/bin/bedtools: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /cm/shared/apps/bcbio/20141204-devel/data/anaconda/bin/bedtools)
This is the same issue you're seeing in #2.
The anaconda folks build their packages on CentOS5 which avoids most of these issues (although they still do get system-specific things -- the curl package doesn't work on non-RedHat systems due to certificate differences). With a CentOS5 based automated build you can avoid the most common compatibility issues and I haven't had any problems with portability of packages so far. I could replicate the setup we have in bcbio-conda here if that would help.
Hi Brad, thanks for the insight! But what about packages like glibc. If you manage to build against that, wouldn't bedtools work on all platforms, regardless of the system glibc?
Oh, I have just seen that this package just seems to copy the system libs. Strange... Ok, in this case, it would be great if you could replicate your CentOS setup for us, thank you!
Alternatively, we could try to get the binstar builds up and running. I experimented a bit with them, but I never managed to trigger a build. I can only submit them, but they don't seem to be assigned to a build queue.
@johanneskoester, do you envision this being Linux-64 only or also Mac/Windows? If the latter, perhaps it makes sense to try to appropriate funds for organization status so we can use those types of build nodes (assuming we can get binstar to work).
FWIW I'm totally fine with it being Linux-64 only.
I think linux is most important. I have uploaded some MacOS X builds because I need them for a Snakemake tutorial I am giving. But in general, I would say we should start with Linux and see where the journey takes us.
I would propose the following plan:
I think I won't go for Windows, because many tools don't support it at all. I also don't want to encourage people to use Win for bioinformatics ;-).
Johannes and David; Thanks for the thoughts on the build ideas. I pushed scripts which we can use for building and uploading packages on Linux and MacOSX. For Linux, it uses a CentOS 5 docker container which will provide hopefully widely compatible binaries. For MacOSX, we should just build directly on a Mac machine but this will require marking tools that don't build on Mac, or where the recipe uses Linux binaries.
I would love to eventually use the anaconda.org builds, but explored this a year ago and didn't have any luck getting it set up, even with offering to pay for build boxes. I got the feeling it's still under development on their side but the situation could have changed in the interim.
Let me know if you have any problems running the scripts, I'm happy to improve the docs or scripts as needed. Thanks again.
I would like to join!
Hey,
I think reproducibility is a hot topic now. We are also working on a similar project: https://github.com/BioDocker/biodocker
Apparently there's a huge double work here (and we are not the only ones).
It would be interesting to partner and for sure we could install your conda packages inside our docker images instead of downloading from source, reducing the download size.
Also, could you please tell me what are the advantages of conda over docker? I've personally never used it but the multiplatform nature and no "mount/port forward/etc" crap seem very nice.
regards
The biggest disadvantage for me personally is that Docker is not allowed on our HPC cluster! In contrast, conda installs everything in my home directory without the need for any elevated privileges.
Conda is mostly for installing executables and libraries. You can't do things like run an isolated mysql server using conda. But that's exactly the sort of thing docker is good at.
While conceptually there's a lot of overlap between the projects, it looks like there's not much domain overlap yet: biodocker has lots of proteomic packages while bioconda has lots of sequencing packages. One way to minimize duplicated work while taking advantage of both projects would be to 1) port existing dockerfiles into conda packages under bioconda and then 2) pull conda packages into docker containers built under biodocker.
@daler I see. Thank you very much. I was also struggling on how to run the programs on our HPC too. this is great.
BTW. indeed the initial creators are from the proteomics field but I'm from the genomics and I've started porting several programs: https://github.com/BioDocker/sandbox
I really like the idea to create packages here and installing them inside docker. makes the docker images lighter and the same package can be used inside docker, outside docker and in HPC. 3 in one.
I like this idea very much. We'll keep talking.
Hi Johannes,
great initiative, I'd be pleased to chip in. I recently started using conda and I haven't gotten round to learning how to build packages, but looking at your recipes should help.
Cheers,
Per
Hi @percyfal, @tomkinsc and @sauloal, Glad to hear that you are interested in the project! I have invited you to the github team. I can also provide access to the corresponding anaconda team if you give me your anaconda.org usernames. The latter is not urgent, we should soon have automatic builds so that direct interaction with anaconda might be rarely needed.
@sauloal: indeed, BioDocker sounds like a perfect complement for bioconda! I would be extremely happy to cooperate and advertise the two together!
I have been doing this for my groups stuff for a while! Glad to see a organized effort somewhere!
I would love to contribute the packages I have built that are not already represented! Please add me to the team!
Gus
Great, thanks for adding me @johanneskoester. My username on anaconda is percyfal.
Welcome Gus! Do you have the same username on anaconda.org? Then I can add you to the team there as well.
@johanneskoester : no I am gusdunn
on anaconda.org. Thanks again!
Gus
ok, added!
Thanks for setting this up everyone, can I help out? I'm roryk on anaconda.org and here.
Welcome! I have added you to the teams.
The docs seem to suggest that people need to join the team to contribute, but I wonder if it makes sense to encourage people to file pull requests as well? AFAIK that seems like the easiest path to making a contribution.
Good idea, I have added a sentence about that (in case somebody does not want to be a permanent team member).
Dear @johanneskoester ,
Sorry for my late response. I would be very happy to contribute to bioconda on behalf of CGAT.
Here is our anaconda channel: https://anaconda.org/cgat
Many thanks and congratulations for this great initiative!
Best regards, Sebastian.
Great, I will add you to the team! Welcome!
This looks like a great idea and should avoid a lot of duplicated effort. Can I join in as well? My github and anaconda usernames are both ostrokach
.
BTW, I think something similar was discussed on the conda email list a while back. It didn't get too far, but there are a few teams there that might be interested in helping out (e.g. ioos, omnia, tacaswell).
Hi, could you add me to the team? I'm brentp
on anaconda.org as well.
Thanks for creating this project.
Welcome!
Hi,
Could you please add me to the team? I'd like to contribute to the community by uploading bcbio-monitor to begin with.
Thanks!
Sure, glad to have a new contributor!
Thank you @johanneskoester !
Hello, I would also like to join the team!
Thank you
Hi, I was recommended by @guillermo-carrasco to add chanjo and some future exciting tools to the recipes
Can I join the team? :ocean:
Welcome, you two! I have sent an invitation!
:+1:
Everybody is welcome to contribute a package. Simply reply to this issue and you will be added to the bioconda team.
Edit: After you post here, we will email you an invitation through github to join the bioconda team. Click the link in that invitation in order to be added.
Edit: To have the Bioconda logo display in your profile, navigate to https://github.com/orgs/bioconda/people and find yourself. Click on 'Private' and select 'Public'.