alphagov / asset-manager

Manages uploaded assets (images, PDFs etc.) for applications on GOV.UK
https://docs.publishing.service.gov.uk/apps/asset-manager.html
MIT License
9 stars 9 forks source link

Serve Whitehall's Number 10 assets from Asset Manager #268

Closed chrisroos closed 6 years ago

chrisroos commented 7 years ago

The Number 10 assets are currently served by the PublicUploadsController in Whitehall.

These are used by the virtual tour on the History of 10 Downing Street page. There appears to be an XML file for each "tour" and these contain links to the assets in the /uploaded/number10 directory on assets.publishing.service.gov.uk (e.g. image_001.xml.erb).

Example number10 asset: https://assets.publishing.service.gov.uk/government/uploads/uploaded/number10/image_001.tiles/preview.jpg

Tasks

chrisroos commented 6 years ago

@andrewgarner has just run rake asset_manager:migrate_assets[uploaded/number10] in production. It's queued up 3186 files.

I suspect these jobs might need to wait until the uploads in https://github.com/alphagov/asset-manager/issues/404 have finished.

chrisroos commented 6 years ago

I've asked 2ndline to compare the number of these assets on the filesystem to the number that have been created in the Asset Manager database.

chrisroos commented 6 years ago

@h-lame ran the following commands in production to compare the assets on the filesystem to those in the Asset Manager database:

# Number 10 assets
$ find /data/uploads/whitehall/clean/uploaded/number10 -type f | wc -l
3186

$ govuk_app_console asset-manager
Loading production environment (Rails 5.1.4)
irb(main):001:0> WhitehallAsset.where(legacy_url_path: %r(/government/uploads/uploaded/number10/)).count
=> 3186
irb(main):002:0> WhitehallAsset.deleted.where(legacy_url_path: %r(/government/uploads/uploaded/number10/)).count
=> 0

The number of assets in the database matches the number on the filesystem so we're all good to open a PR to update the nginx config to serve these assets from asset-manager.

chrisroos commented 6 years ago

I've opened https://github.com/alphagov/govuk-puppet/pull/7130 to update the nginx config to start serving these assets from Asset Manager.

chrisroos commented 6 years ago

For reference, I requested the example asset in the description to confirm that it's being served by Whitehall in production:

$ curl -v "https://assets.publishing.service.gov.uk/government/uploads/uploaded/number10/image_001.tiles/preview.jpg?CJR$RANDOM" > /dev/null

> GET /government/uploads/uploaded/number10/image_001.tiles/preview.jpg?CJR32305 HTTP/1.1
> Host: assets.publishing.service.gov.uk
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Server: nginx
< Content-Type: image/jpeg
< Last-Modified: Tue, 26 Mar 2013 17:20:09 GMT
< ETag: "5151d8c9-103a2"
< Expires: Wed, 24 Jan 2018 01:02:31 GMT
< Cache-Control: max-age=43200, public
< Strict-Transport-Security: max-age=31536000
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: GET, OPTIONS
< Access-Control-Allow-Headers: origin, authorization
< Fastly-Backend-Name: origin
< Content-Length: 66466
< Accept-Ranges: bytes
< Date: Tue, 23 Jan 2018 13:02:31 GMT
< Via: 1.1 varnish
< Age: 0
< Connection: keep-alive
< X-Served-By: cache-lhr6343-LHR
< X-Cache: MISS
< X-Cache-Hits: 0
< X-Timer: S1516712551.003034,VS0,VE65

# Kibana search results for CJR32305
January 23rd 2018, 13:02:31.000  -   -  whitehall-admin.publishing.service.gov.uk-json.event.access
January 23rd 2018, 13:02:31.000  -   -  whitehall-admin.publishing.service.gov.uk-json.event.access
January 23rd 2018, 13:02:31.000  -   -  assets-origin.publishing.service.gov.uk-json.event.access
January 23rd 2018, 13:02:31.000  -   -  whitehall-frontend.publishing.service.gov.uk-json.event.access
January 23rd 2018, 13:02:31.000  -   -  whitehall-frontend.publishing.service.gov.uk-json.event.access
chrisroos commented 6 years ago

I've tested the effect of this PR in integration and used Kibana to confirm that these assets are now being served by Asset Manager.

Note. We don't currently have a realistic set of assets or asset-manager data in integration so I've had to create a Whitehall asset to mirror the example asset in the description.

# Create asset
$ export BEARER_TOKEN=`cat /etc/govuk/manuals-publisher/env.d/ASSET_MANAGER_BEARER_TOKEN`

$ echo `date` > tmp.txt
$ curl \
  -H"Authorization: Bearer $BEARER_TOKEN" \
  -H"Accept: application/json" \
  https://asset-manager.integration.govuk-internal.digital/whitehall_assets \
  --form "asset[file]=@tmp.txt" \
  --form "asset[legacy_url_path]=/government/uploads/uploaded/number10/image_001.tiles/preview.jpg"

# Request the asset in integration
$ curl -v  "https://assets-origin.integration.publishing.service.gov.uk/government/uploads/uploaded/number10/image_001.tiles/preview.jpg?CJR$RANDOM" > /dev/null

> GET /government/uploads/uploaded/number10/image_001.tiles/preview.jpg?CJR9100 HTTP/2
> Host: assets-origin.integration.publishing.service.gov.uk
> User-Agent: curl/7.54.0
> Accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 200
< date: Tue, 23 Jan 2018 14:54:42 GMT
< content-type: text/plain
< content-length: 29
< server: nginx
< vary: Accept-Encoding
< accept-ranges: bytes
< cache-control: max-age=14400, public
< content-disposition: inline; filename="tmp.txt"
< etag: "5a674c8e-1d"
< last-modified: Tue, 23 Jan 2018 14:54:06 GMT
< strict-transport-security: max-age=31536000
< vary: Accept-Encoding
< vary: Accept-Encoding
< x-frame-options: SAMEORIGIN
< access-control-allow-origin: *
< access-control-allow-methods: GET, OPTIONS
< access-control-allow-headers: origin, authorization

# Search Kibana for CJR9100
January 23rd 2018, 14:54:42.193  -   -  asset-manager
January 23rd 2018, 14:54:42.000  -   -  asset-manager-json.event.access
January 23rd 2018, 14:54:42.000  -   -  assets-origin-json.event.access
January 23rd 2018, 14:54:42.000  -   -  static-json.event.access
chrislo commented 6 years ago

These assets are now being served in production. I made the following request

$ curl -v "https://assets.publishing.service.gov.uk/government/uploads/uploaded/number10/image_001.tiles/preview.jpg?CRL$RANDOM" > /dev/null

GET /government/uploads/uploaded/number10/image_001.tiles/preview.jpg?CRL6587 HTTP/1.1
> Host: assets.publishing.service.gov.uk
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx
< Content-Type: image/jpeg
< Content-Disposition: inline; filename="preview.jpg"
< Cache-Control: max-age=14400, public
< ETag: "5151d8c9-103a2"
< Last-Modified: Tue, 26 Mar 2013 17:20:09 GMT
< X-Frame-Options: SAMEORIGIN
< Strict-Transport-Security: max-age=31536000
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: GET, OPTIONS
< Access-Control-Allow-Headers: origin, authorization
< Fastly-Backend-Name: origin
< Content-Length: 66466
< Accept-Ranges: bytes
< Date: Wed, 24 Jan 2018 11:40:33 GMT
< Via: 1.1 varnish
< Age: 0
< Connection: keep-alive
< X-Served-By: cache-lhr6345-LHR
< X-Cache: MISS
< X-Cache-Hits: 0
< X-Timer: S1516794033.938863,VS0,VE202

And can see in Kibana that the request was eventually served by asset manager:

screen shot 2018-01-24 at 06 41 13
chrislo commented 6 years ago

I've moved the task to delete these assets to #405 so that we can close this issue.