andreasscherbaum / gpdb-ansible

Ansible scripts for Greenplum Database
Apache License 2.0
8 stars 7 forks source link

Add support for staging RPM or binary installed from Pivotal Network #4

Open kdunn-pivotal opened 7 years ago

kdunn-pivotal commented 7 years ago

Here's an example rest call, once you've enabled an API token for your account in the profile page:

curl -o greenplum-db-5.0.0-beta.9-rhel6-x86_64.rpm -d ""  -L -H "Authorization: Token <PIVNET_TOKEN>" https://network.pivotal.io/api/v2/products/pivotal-gpdb/releases/6810/product_files/28864/download
kdunn-pivotal commented 7 years ago

This would also be useful for installing GPCC, but we may actually run GPCC elsewhere on its own.

andreasscherbaum commented 7 years ago

Is "releases/6810" and "product_files/28864" stable, or does it change with package versions? This can be incorporated into the Playbook: if the API token is defined, it can download the package from Pivnet,

kdunn-pivotal commented 7 years ago

@Reckhardt-pivotal - any idea how stable the URLs are for Pivnet API-based downloads? I've had no issue storing these invocations in a doc and copy-pasting from them months later.

Reckhardt-pivotal commented 7 years ago

I have no idea. I assume very stable but I've been very wrong before.

-- Rob

On Thu, Sep 7, 2017 at 7:59 PM, Kyle Dunn notifications@github.com wrote:

@Reckhardt-pivotal https://github.com/reckhardt-pivotal - any idea how stable the URLs are for Pivnet API-based downloads? I've had no issue storing these invocations in a doc and copy-pasting from them months later.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/andreasscherbaum/gpdb-ansible/issues/4#issuecomment-327961746, or mute the thread https://github.com/notifications/unsubscribe-auth/AEwzzVyjrRzO6g31hMUlEOgBG7UpHoH1ks5sgINdgaJpZM4PPMhk .

andreasscherbaum commented 7 years ago

Who can verify this?

Reckhardt-pivotal commented 7 years ago

Jim Thompson. I'll send an email.

-- Rob

On Thu, Sep 7, 2017 at 8:49 PM, Andreas Scherbaum notifications@github.com wrote:

Who can verify this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/andreasscherbaum/gpdb-ansible/issues/4#issuecomment-327968904, or mute the thread https://github.com/notifications/unsubscribe-auth/AEwzzdHVQ5nj7WL2bYcoWkHmXGiWvF0zks5sgI8lgaJpZM4PPMhk .

ascherbaum-pivotal commented 7 years ago

Just looked into this, and it will not work:

4.3.16.0: https://network.pivotal.io/products/pivotal-gpdb#/releases/6342 4.3.16.1: https://network.pivotal.io/products/pivotal-gpdb#/releases/6680

Although these are stable URLs, how can I find out what is the URL, based on the version number?

kdunn-pivotal commented 7 years ago

This might be the very issue @ysung-pivotal was mentioning.

It's quick n' dirty but something like this can help:

curl -H "Authorization: Token <TOKEN>"  https://network.pivotal.io/api/v2/products/pivotal-gpdb/releases \
| python -c "import sys, json; from pprint import pprint; print([(k['version'], k['_links']['product_files']['href']) for k in json.load(sys.stdin)['releases'] ])"

we could easily append a if k['version'] == <DESIRED RELEASE> in the end of the Python list comprehension.

andreasscherbaum commented 7 years ago

You are not seriously suggesting this as a solution ...

When I check the "_links" href link, I end up on another JSON page, which contains yet another link with another files ID, and that is the download link. But this second JSON page, as example, does not even contain the full filename, just the download link. I potentially end up with a filename which is different from what the webserver is suggesting in the end. You know, in automation you can't just accept what the webserver is suggesting, you need to provide the output filename before. This is not idempotent.

Therefore in order to make this somehow work it needs 2 curl calls with additional Python parsing, and then a third curl call for the download, plus guess my own filename.

This is wrong on too many levels ...

kdunn-pivotal commented 7 years ago

Well, I wanted to defend myself but you've inadvertently teased out my very pervasive hack nature. ☺️ I'll regress to my original suggestion of a lookup table of URLs, given the eventual cloudfront static URL we were promised in the email thread.

No matter what we do, having the nodes fetch stuff on their own, rather than depending on workstation local files is my preference still. More than willing to discuss the merits of this though.

andreasscherbaum commented 7 years ago

I'm in favor of this approach as well. Given that Pivotal changed the filename format quite a few times in the past (see examples here, probably not even catching all of them):

https://github.com/andreasscherbaum/gpdb-ansible/blob/master/roles/gpdb4/vars/main.yml

I really would like to have a way to specify a version and get back a filename which the Playbook should download. Not just a link, I need to know the final name in order to make this idempotent.

jchesterpivotal commented 7 years ago

Hi folks, I'm in the Pivnet team (today pairing with @mbildner).

Our understanding of @andreasscherbaum's concerns is:

  1. Getting hold of the latest release easily.
  2. Stability of the Pivnet-provided download URL.
  3. Knowing the final on-disk filename before GETting the Cloudfront URL.

Some background: Pivnet cannot behave like a static repository (as most OSS distributions do) for legal reasons. We have to be able to show who downloaded which files, when, from where, and that they agreed to a relevant EULA. A lot of the gymnastics imposed on API consumers and downloaders comes from those constraints.

Finding the latest release

This is a common request, so we added an endpoint for it. Unfortunately, it only works for products following the semver scheme. GPDB uses a different scheme.

So as you are probably already aware, you will need to use the /api/v2/products/:product_slug/releases and filter the returned releases array to find the one you want. Some folks filter by release_type and then sort by release_date and/or id. We defer to the GPDB team for guidance on how best to identify their most recent releases.

Stability of the download URL provided by Pivnet

Once you've identified the release you want, you'd follow _links.product_files to find the files for that release. This takes you to the /api/v2/products/:product_slug/releases/:release_id/product_files endpoint. In there you see the product_files array.

In _links.download, you can see the canonical download URL for this product release file. This URL is intended to be unchanging. You should be able to use it at any time to begin the download process.

As you know, the process works by redirecting you to a generated Cloudfront URL. This URL is not stable and will expire. The signed-URL scheme is relied on to prevent unauthenticated users from downloading product release files.

If you need to store URLs, always store the one you find in _links.download.

Knowing the final on-disk filename before following the Cloudfront redirect

Looking again at /api/v2/products/:product_slug/releases/:release_id/product_files, you will see that for each product release file, there is an aws_object_key field.

The aws_object_key is the "path" that Pivnet will eventually use to identify where in S3 to fetch the file from. The "basename" will be the filename you will receive. You can use this to identify the final filename before following the Cloudfront redirect.

For example, if aws_object_key is product_files/Pivotal-ExampleProduct/example-server-pkg-3.3.91.pivotal, the final filename after download will be example-server-pkg-3.3.91.pivotal.

What about stable Cloudfront URLs?

We've used "stable" or "static" loosely with regards to Cloudfront URLs. Our intention is that the hostname will be stable. Right now you can see hostnames like d13k9s5899twdr.cloudfront.net. The name maps to the Cloudfront "distribution" containing settings for source buckets, download policies and so on.

We don't anticipate having to replace this distributions; in this respect the hostnames are already stable. However, we plan to introduce CNAME mappings for each of our environments (eg download.network.pivotal.io for d13k9s5899twdr.cloudfront.net). This means that we can, if we need to, replace the distribution. It will also help us with debugging and improves the fit-and-finish for API consumers inspecting how it works.

To reiterate: the URL provided in the redirect is not stable. It will expire. Only the URL provided in _links.download is stable. In any case, if you want to be sure, navigating the API ensures that you have Pivnet's most recent opinion about the state of the world.

Some additional notes

API chattiness

You mentioned sequentially following multiple links provided in API responses.

This chattiness is by design. The API is intended to have self-describing, symbolic names for the components of the data model. As a consumer, once you begin to navigate from a high-level endpoint, you should never need to manually construct a URL yourself. You can always rely on the API to tell you the correct URLs for related endpoints.

EULAs

Pivnet users outside of Pivotal need to manually accept EULAs for major and minor releases, but not for other kinds. By "manually accept", we mean that someone will need to log into the Pivnet website and accept the EULA there. That someone will need to be identified by the token being used for upgrades.

This is also a legal requirement. We'd like to automate it, but cannot safely do so.

Tokens

In future, Pivnet-provided API tokens will be replaced with UAA tokens. There will be a deprecation period and will provide guidance on switching when the time comes.

andreasscherbaum commented 7 years ago

Hi @jchesterpivotal, thanks for the detailed explanation.

Let me summarize: in order to download a specific version (not just the latest version) of the product, I still need to jump through all the websites, and figure out my download link. And that is only after someone manually clicked "OK" on the website? And I need to look into S3 if I need the filename.

Also I better should not depend on any "stable" name if it's not on the Pivnet website. That's ok.

That is a many number steps required in order to download a file.

I'm more and more inclined to say that this is a job for a generic Pivnet Ansible module: http://docs.ansible.com/ansible/latest/modules_by_category.html http://docs.ansible.com/ansible/latest/list_of_network_modules.html http://docs.ansible.com/ansible/latest/dev_guide/developing_modules.html

This would not only benefit @kdunn-pivotal but anyone using Ansible to automate deployments. The module should be written in a generic way which let's one specify the product name, version (or "latest") and API key.

What do you think?

jchesterpivotal commented 7 years ago

I still need to jump through all the websites, and figure out my download link.

Yes, the API works by navigating from top-level endpoints.

And that is only after someone manually clicked "OK" on the website?

Correct. Upon download, if no accepted EULA is found for that user, they'll receive a 451 Unavailable For Legal Reasons error code and instructions on what to do.

And I need to look into S3 if I need the filename.

No. To be clear, we mean interpreting the value of the aws_object_key key given in the API response. Nobody is able to read the distribution bucket directly and must always come through a signed Cloudfront URL.

this is a job for a generic Pivnet Ansible module ... What do you think?

I'm not deeply familiar with Ansible, but that does sound reasonable to me. pivnet-resource provides a similar level of isolation from the API. Rather than requiring Concourse users to write scripts to navigate the API, the resource provides the standard check, get and put operations. It might be worth referring to as an example.