datajoint / datajoint-python

Relational data pipelines for the science lab
https://datajoint.com/docs
GNU Lesser General Public License v2.1
169 stars 84 forks source link

Provide `conda` packaging #293

Closed eywalker closed 5 years ago

eywalker commented 7 years ago

As you all may know, Anaconda is a very popular Python packaging environment that aims to simplify the process of getting started using Python for scientific computations. Anaconda users would use the specialized package management tools conda to discover and install new scientific Python packages.

Unfortunately, making packages available on PyPI which is where pip pulls packages from does not make the package available for conda. It would make sense for us to provide DataJoint packaging for Anaconda.

dimitri-yatsenko commented 7 years ago

It seems like conda should enable pip. What do other libraries (e.g. tensorflow) do for conda?

eywalker commented 7 years ago

They just have completely different package management from pip, just like how yum differs from apt. Most other major packages do provide conda version as well.

On Feb 25, 2017, at 5:57 PM, Dimitri Yatsenko notifications@github.com wrote:

It seems like conda should enable pip. What do other libraries (e.g. tensorflow) do for conda?

― You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

dimitri-yatsenko commented 7 years ago

Yes, i have just read up on conda -- I have only used it as part of anaconda. Let's proceed then.

eywalker commented 7 years ago

With #333, we are going to have explicit dependency on graphviz. While we can talk about having graceful degradation of features to support "lightweight" installation of DataJoint (e.g. in an CLI only environment), it would make a lot of sense to move forward with conda packaging and make that the recommended installation strategy for the users as we can add graphviz as explicit dependency.

dimitri-yatsenko commented 7 years ago

Even if we recommend conda, we should provide conda-less instructions for installation. I have instructions for Ubuntu but not for MacOS or Windows. How should we go about that?

dimitri-yatsenko commented 7 years ago

I like the installation instructions for jupyter http://jupyter.readthedocs.io/en/latest/install.html outlining both the conda-based and the pip-based installation.

eywalker commented 7 years ago

indeed that's pretty nice - should definitely add this to Docs and Tutorial.

dimitri-yatsenko commented 7 years ago

We need to work with anaconda to include datajoint in their stack

FlorianFranzen commented 7 years ago

I read into it a little bit and it seems to be straight forward to package DataJoint for conda.

The first thing we will need to do is create the appropriate recipe file, as documented in the official conda docs.

Then we will have to submit it to conda-forge. They have a staging repository on github and accept pull requests. More details can be found in their documentation.

Lastly, and this is optional, we can try and move the recipe to anaconda's default channel. A mirror is maintained on github and they accept pull requests.

This will however only keep users from having to add the conda-forge channel to their installation (with conda config --add channels conda-forge) before being able to install DataJoint.

I will start some work on this in the next few days.

eywalker commented 7 years ago

Hey @FlorianFranzen I'm already >70% done with this process for packaging DJ into conda, so thanks but don't worry about getting into the packaging. The actual process ended up requiring additional work due to some of DataJoint's dependency not straightforwardly available in conda (yet). I don't think it's not necessary that we push it to conda-forge as we already have channel vathes under Anaconda cloud. I guess this depends on how common it is for people to have conda-forge channel included in their conda environment.

I do agree that we should try to get DataJoint into the Anaconda's default channel once packaging is complete.

eywalker commented 7 years ago

Ok I have now packaged DataJoint (and it's necessary dependency pygraphviz) under Conda in channel vathes. One can install datajoint via conda install -c vathes datajoint on Python 3.5 or 3.6 with Linux/MacOSX. Unfortunately I haven't had chance to compile pygraphviz under Windows and hence the lack of availability.

I now think that @FlorianFranzen had a very good point with conda-forge as they can can perform the compilation on multiple platforms automatically. Given that DataJoint itself does not have any C code, I would have not preferred adding another factor, but if we could actually add pygraphviz recipe under conda-forge then I think it's completely worth placing datajoint under conda-forge as well. It's just that I'm quite unsure whether we can provide a recipe for another OSS project that we do not maintain. I think exact same statement applies to the the official anaconda channel - we will need to somehow provide for pygraphviz conda package.

FlorianFranzen commented 7 years ago

I think it should be enough to inform the pygraphviz maintainers about your plans, so they can add themselves to the maintainer list in the recipe, if they want to cooperate maintaining it.

There is already an issue asking for an addition to conda-forge, so just announcing it there should be enough.

dimitri-yatsenko commented 7 years ago

Please review #353

dimitri-yatsenko commented 6 years ago

Where do we stand with conda packaging?

ixcat commented 6 years ago

Where do we stand with conda packaging?

haven't gotten to it - will take a crack at it this week.

dimitri-yatsenko commented 5 years ago

updates?

tjd2002 commented 5 years ago

I'd like this!

I have confirmed that I can pull in all the needed requirements for DataJoint for conda, except for the minio python sdk. I will open an issue at minio/minio-py on this.

$ pip show datajoint
Name: datajoint
Version: 0.11.2
[...]
Requires: numpy, pyparsing, pydot, networkx, pymysql, pandas, tqdm, minio, ipython

# succeeds:
$ conda install -c defaults -c conda-forge numpy pyparsing pydot networkx 'pymysql>=0.7.2' pandas tqdm ipython # minio not available  

FWIW: pygraphviz is now available from several channels, so in the event that you want to pull that dependency in by default, you can now do it (actually it is already pulled in with networkx by default).

I have quite a bit of experience with conda packaging, so happy to answer questions or do a code review on any recipe (meta.yaml) files.

tjd2002 commented 5 years ago

I have started the process of adding minio to conda-forge (https://github.com/conda-forge/staged-recipes/pull/8517). Once it's accepted I/we can do the same for DataJoint with minimal effort.

tjd2002 commented 5 years ago

OK, that was fast! The minio feedstock was approved and 'minio' is now available in conda-forge.

I have proposed a conda-forge package for DataJoint, and included @dimitri-yatsenko as a 'maintainer'. This means that Dimitri will be able to modify the recipe (including adding other maintainers, bump version, update requirements, etc.)

Assuming this works, the big remaining item will be to integrate conda packaging into datajoint's release process. In general, it looks like this: 1) Publish a new version of your pip package on PyPI 2) Manually update the conda recipe (meta.yaml file) in a PR at https://github.com/conda-forge/datajoint-feedstock (<-doesn't exist until the package is accepted):

dimitri-yatsenko commented 5 years ago

Would you please make @guzman-raphael the maintainer instead?

tjd2002 commented 5 years ago

OK, I think the wheels are all in motion over at conda-forge. ~If anyone wants to try this out before conda-forge completes their review and CI, you can try it out right now using a version of the DataJoint package on my personal channel (https://anaconda.org/tjd2002/datajoint)~ [Removed]

tjd2002 commented 5 years ago

OK, datajoint is now available on conda-forge 🎉 , so after updating docs, I think this can be closed.

conda create --name dj --channel conda-forge datajoint
conda activate dj
python -c 'import datajoint'
tjd2002 commented 5 years ago

If you decide to stick with shipping conda packages by conda-forge, I would suggest deleting the old packages from https://anaconda.org/vathes (to avoid confusion)

tjd2002 commented 5 years ago

@guzman-raphael, can I have your permission to add you as a maintainer on the minio feedstock as well? I will try to keep it updated, but this way you can also push releases there if datajoint needs them.

guzman-raphael commented 5 years ago

@tjd2002, sorry just seeing this now. Yes, you can add me as a maintainer to the mino feedstock too. Thanks again! We are in the process of making a new release soon and will definitely try to include this in our process.

guzman-raphael commented 5 years ago

@tjd2002 one more thing. I am new to conda-forge but it looks as if there is a bot that creates a PR automatically based on new updates to our PyPi module. Do you know if we can include --pre releases into this process as well or do we need another feedstock repo? I also noticed that the auto-gen PR was merged on our behalf by a mariusvniekerk. Do you know who this is? If we are to use this for a Production process, we need to be able to restrict how these would be released. Currently, I do not seem to have the appropriate privilege to be able to see who all can merge PR's.

dimitri-yatsenko commented 5 years ago

Solved in https://github.com/conda-forge/staged-recipes/pull/8524