Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.55k stars 2.77k forks source link

Merge all packages in a single pypi package (for real) #815

Closed Fale closed 6 years ago

Fale commented 7 years ago

Having tens of pypi packages that are kind of united but not really makes it very difficult to package it in distros, please fix it (also because due to how python includes works there is no real advantage in doing a huge amount of small packages)

lmazuel commented 7 years ago

Hi @Fale

Thanks for the feedback, but we do not plan to merge all packages into one. Each endpoint is managed by different service team that didn't have the same deadlines. For instance, it's not because the Batch team made a breaking change, that it's worth updating a big package including Compute and Resources. Actually, most of our users are using 2 or 3 packages (like Compute+Network) and don't want to install the whole services on their machine (size on disk + time to install). In addition, this is a consistent experience across language. We are already working with Debian to provide one "azure" package with a frozen state of several packages and we're happy so far with that solution. We will update in a near future the "azure" meta-package, so you'll be able to install one package with one command if you want, but giving the opportunity to other people to precisely chose what they want.

Fale commented 7 years ago

@lmazuel thanks for the answer. Debian is not the only distro out there. I'm trying to understand if the Debian approach is feasable for us as well or not, but from what I see, they have packaged only a minimal part of the azure python code. Also I noticed that OpenSUSE has the same problem (BUG #525)

Also, the argument that this approach is consistent with the other languages is pointless, since is not consistent with python (and guess what, the users of the python-sdk do not care about the other SDKs, while they care about consistency with the rest of python)

CalvinHartwell commented 7 years ago

+1 for Fale's issue

The code base is more like an SDK of individual SDK(s) rather than a single SDK with many classes? There is a lot of boilerplate code caused by the mass of pypi modules. For example, each pypi module has a setup.py (similar to project files in Visual Studio).

lmazuel commented 7 years ago

Thanks @CalvinHartwell for your comment :)

@Fale I agree that the cross language point is not interesting, let's forget I said that :)

I cited Debian as an example because it's recent work for us, but I know there is more distro, RHEL, CentOS, Suse, ArchLinux etc (Mandriva when I was young...). Please be sure I keep an open-mind here and I'm very interested in the conversation (I really do).

What I don't get is why you think a meta-package "azure" that installs some other package is a problem. Meta-package is something common in the Linux world, It's the whole point of the dependency system. In this situation, you have (for instance with random version number) a package "azure" 2.4.2 that will install "azure-mgmt-resource" 0.45.0 and "azure-mgmt-compute" 0.65.0 at the same time. So, you apt-get/yum install azure and got several Python packages at once. What's the issue?

Currently, the "azure" meta-package is not accurate because some core libraries are still in preview. But the final plan is to make sure that "azure" will install every stable packages at the same time, with fixed version. "preview" services will be available for testing as separate package if you want.

Please note that we also have people that are happy to install exactly what they want (only the Azure services they need).

Thoughts?

Fale commented 7 years ago

The current implementation has 3 problems from my point of view:

  1. You put a huge amount packages in the same repo. I get that this approach has some advantages, but is exactly the opposite of git and python best practices and common practices
  2. Every single package has it's own release cycle (and therefore version number), so the azure one, is not a real meta package but an installer.
  3. If this is an SDK, should have a single release cycle and release number, otherwise is a collection of libraries (which is not the same thing as an SDK)

Also, the point of "many libraries" is a problem during the packaging effort because for many distro (included Fedora, and *EL, for which I'm looking at this SDK) every single pypi package has to be managed in a different (rpm) package, and therefore this means that to maintain the Azure SDK and Azure CLI I will have to maintain ~50 packages. On the other side, I'm the mainter of the AWS python SDK and AWS CLI tool and those are only 3 packages (2 for the SDK, 1 for CLI).

lmazuel commented 7 years ago

@Fale To discuss, let's assume I merge everything into one package. Botocore is using meta-descriptions of RestAPI "on the fly", we use our meta-descriptions to generate Python code, which takes more place on disk. Our biggest package (Web) is currently doing 1.8Mb. On average packages are 500Kb sized. This means that 40 services of Azure, we can estimate the azure package to reach 20Mb. We plan to support several APIVersion for compatibility in a near future, which can leads a Python package of 200Mb easily. Of course it's an estimate, but it's a likely scenario. What do you think?

Edit: Change numbers to more accurate ones

lmazuel commented 7 years ago

@Fale Also, why can't you put inside one rpm file several python packages at the same time? Is there some technical limitation somewhere, or is it just non conventional?

Fale commented 7 years ago

Not be be read in a bad way, but your point is that your code is bloated and for this reason is better to split it? This argument is not very strong, I think...

About the fact of putting more python packages in the same rpm is against the policy in many cases, also in the specific case is not possible (technically speaking) due to the fact that an RPM pakage has a single version and all azure packages have different versions and using wrong version is definitely against the policies

lmazuel commented 7 years ago

If you have a strong equivalence azure 2.1.2 == azure-mgmt-resource 0.40.0 + azure-mgmt-compute 0.35.0 and so on, why don't you create a python3-azure package 2.1.2? There is no ambiguity, no cheating, and you have a single version to use. It's the approach used by Debian, and I don't see where it's against any policy? Really, I'd like to understand your point, but I'm not seeing the technical problem yet :(

lmazuel commented 7 years ago

@rjschwei @bear454 @schaefi, would you like to share your point of view for Suse? @irl, would you like to share your point of view for Debian?

Fale commented 7 years ago

Fedora guidelines force us to package things as much as possible "as the upstream" does.

So, currently I should do the following packages:

Can I create the package python?-azure 2.1.2 that ships all files of the other modules? No Why?

  1. It violates the "stick to what upstream does" policy
  2. It will become a mess to manage due to the fact that other packages that will depend on specific libs of the SDK (probably the majority) will have dependencies to that specific module and to a set of versions (or a specific version) which will not be 2.1.2, but for instance could be [>=0.30.0; < 0.40.0] for azure-mgmt-compute.

I can do what you describe only if the release cycle and the version number will be the same for all modules.

rjschwei commented 7 years ago

So originally we also struggled with the issue of many Python packages vs. one rpm package which is why we opened the other issue and it took us a while to wrap our head around how to approach this. We finally decided to basically follow the Python package strategy.

So what we have today are 2 packages https://build.opensuse.org/project/show/Cloud:Tools?search=azure, python-azure-sdk and python-azure-sdk-storage. As the whole thing gets broken into smaller pieces with different upstream teams managing different code streams I see the argument about the coordinated release problem.

Also note that we decided to pull our sources from GitHub rather than pypi. We did struggle with the way things are pushed to pypi and found pulling from GitHub to be a better approach for us for package creation.

To a certain degree we/I followed a similar approach with the ec2utils we provide in our Enceladus project, https://github.com/SUSE/Enceladus/tree/master/ec2utils, meaning different release cycles for each utility.

Having different Python packages, which for us will eventually translate into different rpms implies that client code, for us azurectl, https://github.com/SUSE/azurectl, can be more precise about dependencies, which is an advantage. We have not yet packaged the new az tools, thus I cannot speak to the effect on that packaging effort and dependency management from that point of view, but I am certain we'll solve that in a reasonable way.

While I share the concern of @Fale regarding package proliferation as well as the multi Python package approach not being "Pythonic", there are equally valid arguments on the other side, meaning to have a sdk-storage package, maybe one for networking etc. I think the argument about being "Pythonic" mostly comes into play after install. Meaning as long as I can

from azure.storage import .... from azure.networkig import ...

as a Python developer I really do not care whether site-package/azure/storage and sit-packages/azure/networking were installed by 2 distro packages or 1. Having multiple packages may be a bit more cumbersome for the developer to set up the system, meaning the developer has to potentially install many packages but that can be easily done with a one liner:

pckgmgr --search python-azure | grep sdk | xargs pckmgr install

or a meta-package, i.e. we can easily create a python-azure-sdk-all package or create a pattern that pulls all the other packages.

I think it is equally valid to look at each service Azure provides as a separate target and have a separate SDK for that target as it is to look at the API as a whole.

That it was decided at the origination of boto, and now carried into botocore, that all of the AWS API should be in one SDK, one Python package, is just as valid a decision as the decision made here that each service should have its own SDK managed by separate teams.

So long story short, from my perspective either way is fine. If from a development perspective things are managed more easily at Msft to have multiple Python packages we are game to follow that route.

derekbekoe commented 7 years ago

Referencing related issue created for Python CLI which also has separate packages - https://github.com/Azure/azure-cli/issues/1055

irl commented 7 years ago

From the Debian perspective, I am ignoring PyPI and packaging from the Git repo. The idea that in the future there will be changes to the individual packages which will then be released via PyPI and not git tags breaks this. If the releases of all the core modules (i.e. the ones in this repo) are synchronised it makes everything a lot easier for me, and I guess for other distros too.

lmazuel commented 7 years ago

Thanks @rjschwei and @irl ! So, in summary, what I understand is as long I create some checkpoint as tags in the repo, you're good to package the current Github state as "tag" version number. I can publish on PyPI new packages for a specific service if available, but it will be sync as a Linux package only when I release a new version of the "azure" meta-package (with a new associated tag on Github). That's seems fair to me. Anyway, I plan to release more often the "azure" meta-package once the core ARM modules (Storage/Compute/Network/Resource) will be officially stable. @Fale are we fine with that plan?

Fale commented 7 years ago

It will be long and painful, but I can make it work with the policies. Thanks

schaefi commented 7 years ago

So, in summary, what I understand is as long I create some checkpoint as tags in the repo, you're good to package the current Github state as "tag" version number.

yes this makes packaging work much easier. The same code base referencing that release tag should also exist on pypi. In my projects that happens automatically see:

https://docs.travis-ci.com/user/deployment/pypi

I can publish on PyPI new packages for a specific service if available, but it will be sync as a Linux package only when I release a new version of the "azure" meta-package (with a new associated tag on Github).

yes and it should be possible to make this an automatic step

Regards, Marcus

irl commented 7 years ago

@Fale fwiw - "do what upstream does" doesn't have to mean PyPI as PyPI is a downstream distribution of the upstream, there's no reason not to package up the sources from Git (possibly my Debian-oriented frame of reference). I have one source package, but plan to build one binary package for each of the logical PyPI packages within that.

@lmazuel That's perfect for me (:

@derekbekoe This would work great for the azure-cli package also.

Fale commented 7 years ago

@irl: I think I'll go the github way too. We do this for many situations :). I'll go for one src.rpm and multiple rpms, but even having a single src.rpm package, it will be a fairly complex spec to be able to generate all the various sub packages properly considering files and versions etc

irl commented 7 years ago

@Fale I will just be shipping everything with the version number of the metapackage. Subcomponents may have differing versions, but they're all part of the larger "unified" release.

Fale commented 7 years ago

@irl How do you manage packages dependencies that depends on the subcomponents? ie: https://github.com/Azure/azure-cli/blob/master/src/azure-cli-core/setup.py#L49

irl commented 7 years ago

@Fale If there's a tagged unified release in git and the dependencies don't line up, then Microsoft has done a terrible job at release management. I don't anticipate this happening often.

Fale commented 7 years ago

@irl Microsoft point is that they want to have different version numbers to be able to have different development cycles for the various parts of the codebase, so I anticipate this happening often going forward

irl commented 7 years ago

@Fale yes, between releases it may be broken and not all lined up, but the metapackage needs to be released with everything lined up otherwise it will never be installable, so the idea would be to package in distributions when, and only when, the metapackage sees a release and the git repo is tagged.

Fale commented 7 years ago

Also:

irl commented 7 years ago

@Fale not through the package management system they can't, and it's not a bug in your system if they've done something to break it. I fully anticipate other packages depending on the sdk, I have vagrant-azure in Debian depending on the Ruby SDK, and I'm quite happy to continue supporting this. I've had to patch the crap out of it to get it to work with the latest SDK, but as a distribution packager I expect to have to do some work occasionally.

From what I can see, Microsoft are new to this and I'd rather compromise and accept a little extra work than make demands that they change their entire project management workflow and put them off engaging in open source communities. This situation is no different from any other situation where you have a library and it has dependencies, some external. As a distribution packager, you should be performing QA to catch these problems and working with upstream to find resolutions, or patching locally within your distribution to ensure all your packages line up.

Fale commented 7 years ago

A couple of points and then I'll stop with this since we are going OT:

Microsoft are new to this and I'd rather compromise and accept a little extra work than make demands that they change their entire project management workflow and put them off engaging in open source communities.

  1. Since Microsoft is new, is even more important to discuss with them "how open source works", so that they can understand it before doing errors
  2. I can compromise and accept extra work, I will not compromise and accept to break Fedora dependencies
irl commented 7 years ago

@Fale I think we're aggressively agreeing with each other perhaps. The important thing is that there are releases of the metapackage that have all the dependencies working together nicely.

To summarise my view:

There is a great article on the topic of packages vs. pip here: https://notes.pault.ag/debian-python/

lmazuel commented 7 years ago

Thank you for your contributions, it really matters :). I agree with latest @irl comment.

In full transparency, the next possible breaking change might be for Azure Stack support in the SDK in a few months. I will bump a major version for every packages at the same time, and keep it in a parallel branch in preview as long I'm not sure than Ansible and others folks I'm in contact with are not ready (FYI I wrote the Ansible plugin with a workmate at MS, I really care to make it work). We don't plan a major change, but the way we have to create the client and authenticate against Azure might change enough to justify a version bump. For now it's subtle enough, that we can provide one line at the start (like Python2/3 compat) to support both versions at the same time. We'll try to keep it that way.

Do not hesitate to contact me by Github or direct email (@microsoft.com) if you have any questions or concerns.

Fale commented 7 years ago

I was trying to package this repo as suggested by @irl and @rjschwei and I noticed a problem with a circular dependency with this approach:

azure (aka this repo) depends on azure-storage which depends on azure-common (aka this repo).

How have Debian and Suse manage to make this work?

irl commented 7 years ago

In Debian, we have an unstable distribution where we can have things that have broken dependencies. In practice, building python-azure (as we call this repo) does not depend on python-azure-storage, only running it. We can upload python-azure and python-azure-storage and our tools will automatically move these to the testing distribution (currently stretch) when all the dependencies line up.

rjschwei commented 7 years ago

You can see our spec file here:

https://build.opensuse.org/package/view_file/Cloud:Tools/python-azure-sdk/python-azure-sdk.spec?expand=1

and as mentioned we do have a separate package for storage, the specfile is here:

https://build.opensuse.org/package/view_file/Cloud:Tools/python-azure-sdk-storage/python-azure-sdk-storage.spec?expand=1

Fale commented 7 years ago

@irl thanks

@rjschwei do you have some king of "auto-generating" dependencies? Because I don't see any dependency declaration to the other package on those two spec files

schaefi commented 7 years ago

[2]@rjschwei do you have some king of "auto-generating" dependencies? Because I don't see any dependency declaration to the other package on those two spec files

the azure storage api does not depend on the azure service management api. Thus python-azure-sdk and python-azure-sdk-storage do not have a dependency on each other. Our python-azurectl utility however depends on a specific min. version of the storage and servicemanagement api in order to work correctly with the sdk features azurectl uses

lmazuel commented 7 years ago

Back in the day, we (the Python team) made the first azure-storage package, so it was on this repo. When it was obvious that we had some performance issue and that storage drove a lot of specific questions, the storage team took ownership directly to improve it drastically and be more responsive. This implies that the "azure-storage" package is close to our package structure and use our meta-package, but is in another repo (unlike DocDB for instance). I didn't realize that this could be a problem here :(. It seems to me that azure-storage should be in the Linux package "as if it were on the azure-sdk-for-python" repo. This solve the dependency problem, but I agree it makes the situation a little more complicated :(. It's the only exception for history reason. We do not plan to include more data stuff not in this repo, like DocDB, in the azure meta-package. We are working on azure-keyvault, but this will be in this repo as well.

Fale commented 7 years ago

@schaefi it seems like azure-sdk-storage depends on azure-common (https://github.com/Azure/azure-storage-python/blob/master/setup.py#L68) which is in this repo.

@lmazuel the majority of distribution do not accept multiple git repos as source of a single source package, so the azure-storage can not be provided by the same source package of the python-sdk. As @irl suggested, it would be possible to build both packages and then push them together to stable, but this would mean that no tests can be performed during the build since at build time you can not install the other package (since it would depend on the package you are building at the moment) and for Fedora tests should be performed in build time (it's not mandatory but it's highly suggested). Also the "push together" approach could have other problems during the life of the package.

lmazuel commented 7 years ago

Ok, so I guess the best approach is to leave azure-storage in it's own package. azure-storage needs azure, but we can cut the opposite link (azure does not really need azure-storage, no code is using it in any package). This avoid you a circular link. I think in the long-term the dependency of azure-storage on azure-common will disappear. azure-common is used only for legacy code now in this repository. This will simplify the situation.

bear454 commented 7 years ago

On Mon, 2016-10-17 at 09:45 -0700, Laurent Mazuel wrote:

I think in the long-term the dependency of azure-storage on azure- common will disappear. azure-common is used only for legacy code now in this repository.

Is "legacy code" here used in the generic way, or are you specifically referring to ASM ?

Fale commented 7 years ago

@lmazuel the dependency dropping would solve this circularity problem :)

lmazuel commented 7 years ago

@bear454 I mean "no ARM", so ASM + azure-servicebus. All common libraries for ARM are in msrest/msrestazure packages.

optlink commented 7 years ago

Is there a chance we could at least see another bundled release? I'm trying to maintain packages for azure-cli on Arch but I've run into a problem where the current 2.0.0rc6 release is too old and results in module errors and the git builds are too new resulting in a different set of errors.

I've tried building each module in this repo separately but there's a ridiculous amount of them and several of them seem to be unable to install independently as needed for Arch packaging.

I can see the argument for this because of independent release cycles but as a user I don't care. I need something that just works. The easiest solution, I think, would be to package major releases every so often that contain stable versions of each module.

lmazuel commented 7 years ago

Hi @optlink

Yes, the rc7 is planned. The problem is still that for a meta-package to be tagged as "stable", I need all sub-dependencies as stable. And it's not the case currently.

However, if you do packages for the CLI, the CLI uses the same behavior: they are cut into services and sub-packages . For instance, this is the Network implementation of the CLI. And this package follow directly a specific version of Network (this 2.0.1 is linked to 0.30.0). So even if I do a rc7, I can't assure that at all packages will match all the sub-dependencies of the CLI. And even if I assure it today, this can change tomorrow with an update of Network or something else.

I'm not sure I understand clearly your constraints (I'm a Ubuntu user, I just know Arch by name sorry :-( ), but send me an email at MS (\<githubalias> at microsoft.com) and we will discuss with the CLI team how to help, or at least brainstorm something.

FYI @derekbekoe @johanste

rjschwei commented 7 years ago

On 03/30/2017 11:03 AM, Kelsey Maes wrote:

Is there a chance we could at least see another bundled release? I'm trying to maintain packages for azure-cli on Arch but I've run into a problem where the current 2.0.0rc6 release is too old and results in module errors and the git builds are too new resulting in a different set of errors.

I've tried building each module in this repo separately but there's a ridiculous amount of them and several of them seem to be unable to install independently as needed for Arch packaging.

I can see the argument for this because of independent release cycles but as a user I don't care. I need something that just works. The easiest solution, I think, would be to package major releases every so often that contain stable versions of each module.

For what it's worth. So far we have also created a bundled package for openSUSE and SUSE Linux Enterprise. However, we are starting with packaging the az tools (azure-cli) and it is also split into many pieces. Those in turn depend on the pieces of the SDK rather than the SDK as a whole. Thus as a package you end up having to either maintain a large number of packages or a large number of directives for provides in order to sort out the proper version dependencies. We are going down the road of many packages.

lmazuel commented 7 years ago

Thank you @rjschwei for your message. We will still continue to release one package per service, but do you have a suggestion of zip/tar.gz/package/tools or something that we can do to simplify your process?

rjschwei commented 7 years ago

On 04/03/2017 12:19 PM, Laurent Mazuel wrote:

Thank you @rjschwei https://github.com/rjschwei for your message. We will still continue to release one package per service, but do you have a suggestion of zip/tar.gz/package/tools or something that we can do to simplify your process?

I don't think there is anything else to do. The only simplification would be to release everything as one, but we've already had that discussion, and then have the cli package depend on that SDK version. Anyway, since the cli gets released as Python packages based on service components that in turn depend on Python packages that are released per service component that's is really the model we have to follow on the packaging level.

glaubitz commented 7 years ago

Hi!

I have recently picked up the task to package the Azure SDK in openSUSE. For openSUSE, the current plan is to use the packages from the PyPi repository. However, while working through the various azure-mgmt-* packages I noticed that many packages are either outdated on PyPi ( those are commerce, compute, network, powerbiembedded, resource, servicebus and storage from the mgmt packages and the meta packages azure-mgmt and azure-nspkg) or are missing the __init__.py files so that setuptools fails to install them properly (those are eventhub, media, network, resource, search, servermanager, servicebus and storage).

On the other hand, I could also use the tarballs generated by the git tags in the github repository. However, it's not clear to me which of these tags should be used when packaging the whole SDK while ensuring all modules are compatible with each other. If I read the discussion correctly, some of the modules can be too new so that they won't work with certain other modules anymore and can only be used individually. And if one wants to deploy the whole SDK, all modules must have the version belonging to a particular version of the whole SDK.

So, my question now is: How do I get release tarballs with the proper versions for each module so that I get a complete and working SDK in the end? PyPi is currently apparently not the best source for the aforementioned reasons and so are the released creates through the git tags on github.

Thanks!

glaubitz commented 7 years ago

I just figured out that the SDK releases are available as single tarballs generated from the git tags, they follow this pattern:

https://github.com/Azure/azure-sdk-for-python/archive/v?(\d.*)[A-Z,a-z,0-9]*\.zip

e.g.:

https://github.com/Azure/azure-sdk-for-python/archive/v2.0.0rc6.tar.gz

So, I suggest just pulling the tarball from there and using this as a base for the packaging.

rjschwei commented 7 years ago

Hi,

On 05/10/2017 05:02 AM, John Paul Adrian Glaubitz wrote:

I just figured out that the SDK releases are available as single tarballs generated from the git tags, they follow this pattern:

|https://github.com/Azure/azure-sdk-for-python/archive/v?(\d.*)[A-Z,a-z,0-9]*\.zip|

e.g.:

|https://github.com/Azure/azure-sdk-for-python/archive/v2.0.0rc6.tar.gz|

So, I suggest just pulling the tarball from there and using this as a base for the packaging.

That doesn't work because the azure-cli releases depend on the individual components of the SDK, not on the SDK as a whole.

So as packager there are two choices:

a.) Create 1 package for SDK, as we pretty much do in openSUSE right now and then have a very long list of Provides: statements where each Provides lists a component. This list is going to be a PITA to maintain and will inevitably be wrong and cause headachs b.) Package each individual component of the SDK, the approach we are now taking.

johanste commented 7 years ago

@derekbekoe, do you have any suggestions?

lmazuel commented 7 years ago

Hi @glaubitz, sorry I didn't answer earlier, it was busy with //build/ this week and PyCon next, and I wanted to take the time to answer you correctly. I just want you to be sure I don't ignore you, I'll be back with my full brain soon :)