Store cache on AWS S3? - Githubissues

clarete / curdling

Concurrent package manager for Python

http://clarete.li/curdling

GNU General Public License v3.0

276 stars 19 forks source link

Store cache on AWS S3? #79

Open adamfeuer opened 10 years ago

adamfeuer commented 10 years ago

Would you be interested in a pull request for some code that lets you put the curdling cache in an AWS S3 bucket, instead of using the flask server?

We're using the Atlassian Elastic Bamboo Continuous Integration system, and want to store our binary distributions made by curdling somewhere all our Elastic Bamboo build workers can get them. We frequently go through periods where we have no Elastic Bamboo instances running, and would rather not run a separate instance to run the server.

josegonzalez commented 9 years ago

@adamfeuer did you ever end up implementing this feature?

adamfeuer commented 9 years ago

Right now I'm using pip_accel and have implemented the S3 cache for that software. It's not a very big change, but it requires a few large libraries like boto.

If there's interest, I could port my change to curdling. What do you think?

josegonzalez commented 9 years ago

I'd certainly use it (same with pip-accel, either of which I want to get working with travis/our build process).

clarete commented 9 years ago

Hi @adamfeuer Sorry for taking so long to chime in. A patch with this feature would be pretty awesome actually. I'd suggest taking a look in the Uploader service. Also, I think the dependency on boto (or botocore, if you wanna try something new) should be optional. But it should be simple to add something like [easy_install | pip install] curdling[s3] to the setup.py file.

I know the uploader service is not tested, but I'd love to catch up coverage for unit and/or functional tests. To run the whole test suite, just create a virtual environment and run make. A shady Makefile will take care of setting up dependencies, preparing the environment & running all the tests!

Please let me know if there's anything else I can help! Thanks a lot!

adamfeuer commented 9 years ago

@clarete I'm not understanding what you mean about the setup.py file.

For the boto dependency, I could do something like this:

try:
    import boto
    enable_the_feature()
except ImportError:
    disable_the_feature()

And just update the docs to say if you want to use the feature, you need to install boto yourself. Would that work for you?

Regarding Uploader - are you suggesting I make an S3Cache service like the Uploader service?

josegonzalez commented 9 years ago

@adamfeuer if you did that, make sure you have a minimum version on boto - the api changes between patches sometimes :(

clarete commented 9 years ago

Hi @adamfeuer,

Your idea is definitely the way to go in the code, I like disabling the feature when the lib is not installed. The trick I mentioned for the setup.py file would just make it easier for curdling users to install the package with S3 support with one command! That would be quite simple:

extras_require = {
        'server': parse_requirements('requirements-server.txt')[0],
        'S3':  ["boto>=x.y.z"], # This is the new line we need! :)
}

So the user can install curdling with S3 support by doing the following:

$ easy_install curdling[S3]

$ pip install curdling[S3]

About the Uploader, I guess the perfect scenario would be allowing the user to switch between different uploaders. To keep the core simple, I'd add new features to the Uploader service namespace. IOW: I'd just add the user configurable switch to the Uploader class for getting the feature working, than refactoring if/when needed to get more/better features. What are your thoughts?