alphagov / asset-manager

Manages uploaded assets (images, PDFs etc.) for applications on GOV.UK
https://docs.publishing.service.gov.uk/apps/asset-manager.html
MIT License
9 stars 9 forks source link

Serve Whitehall's assets stored in government/uploads/system/uploads/ from Asset Manager #420

Closed chrisroos closed 6 years ago

chrisroos commented 6 years ago

Example asset: https://assets.publishing.service.gov.uk/government/uploads/government/uploads/system/uploads/edition_organisation_image_data/file/71/Love-Your-Local-Market-960x640.jpg

Tasks

chrisroos commented 6 years ago

@andrewgarner has just run rake asset_manager:migrate_assets[government/uploads/system/uploads] in production and reported the following output:

14:32:49 Migrating 677 files
14:32:49 Finished: SUCCESS

I suspect these jobs might need to wait until the uploads in #404 have finished.

chrisroos commented 6 years ago

I've asked 2ndline to compare the number of these assets on the filesystem to the number that have been created in the Asset Manager database.

chrisroos commented 6 years ago

@h-lame ran the following commands in production to compare the assets on the filesystem to those in the Asset Manager database:

# Legacy images
$ find /data/uploads/whitehall/clean/government/uploads/system/uploads -type f | wc -l
677

$ govuk_app_console asset-manager
Loading production environment (Rails 5.1.4)
irb(main):001:0> WhitehallAsset.where(legacy_url_path: %r(/government/uploads/government/uploads/system/uploads/)).count
=> 677
irb(main):002:0> WhitehallAsset.deleted.where(legacy_url_path: %r(/government/uploads/government/uploads/system/uploads/)).count
=> 0

The number of assets in the database matches the number on the filesystem so we're all good to open a PR to update the nginx config to serve these assets from asset-manager.

chrisroos commented 6 years ago

I've opened https://github.com/alphagov/govuk-puppet/pull/7129 to update the nginx config to start serving these assets from Asset Manager.

chrisroos commented 6 years ago

For reference, I requested the example asset in the description to confirm that it's being served by Whitehall in production:

$ curl -v "https://assets.publishing.service.gov.uk/government/uploads/government/uploads/system/uploads/edition_organisation_image_data/file/71/Love-Your-Local-Market-960x640.jpg?CJR$RANDOM" > /dev/null

> GET /government/uploads/government/uploads/system/uploads/edition_organisation_image_data/file/71/Love-Your-Local-Market-960x640.jpg?CJR669 HTTP/1.1
> Host: assets.publishing.service.gov.uk
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Server: nginx
< Content-Type: image/jpeg
< Last-Modified: Wed, 14 Nov 2012 18:36:06 GMT
< Content-Disposition: inline; filename="Love-Your-Local-Market-960x640.jpg"
< Cache-Control: max-age=14400, public
< ETag: "50a3e496-15f47"
< X-Frame-Options: SAMEORIGIN
< Strict-Transport-Security: max-age=31536000
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: GET, OPTIONS
< Access-Control-Allow-Headers: origin, authorization
< Fastly-Backend-Name: origin
< Content-Length: 89927
< Accept-Ranges: bytes
< Date: Tue, 23 Jan 2018 13:01:13 GMT
< Via: 1.1 varnish
< Age: 0
< Connection: keep-alive
< X-Served-By: cache-lhr6322-LHR
< X-Cache: MISS
< X-Cache-Hits: 0
< X-Timer: S1516712474.505003,VS0,VE114

# Kibana search results for CJR669
January 23rd 2018, 13:01:13.591  -   -  whitehall
January 23rd 2018, 13:01:13.000  -   -  whitehall-frontend.publishing.service.gov.uk-json.event.access
January 23rd 2018, 13:01:13.000  -   -  assets-origin.publishing.service.gov.uk-json.event.access
January 23rd 2018, 13:01:13.000  -   -  whitehall-admin.publishing.service.gov.uk-json.event.access
January 23rd 2018, 13:01:13.000  -   -  whitehall-frontend.publishing.service.gov.uk-json.event.access
January 23rd 2018, 13:01:13.000  -   -  whitehall-admin.publishing.service.gov.uk-json.event.access
chrisroos commented 6 years ago

I've tested the effect of this PR in integration and used Kibana to confirm that these assets are now being served by Asset Manager.

Note. We don't currently have a realistic set of assets or asset-manager data in integration so I've had to create a Whitehall asset to mirror the example asset in the description.

# Create asset
$ export BEARER_TOKEN=`cat /etc/govuk/manuals-publisher/env.d/ASSET_MANAGER_BEARER_TOKEN`

$ echo `date` > tmp.txt
$ curl \
  -H"Authorization: Bearer $BEARER_TOKEN" \
  -H"Accept: application/json" \
  https://asset-manager.integration.govuk-internal.digital/whitehall_assets \
  --form "asset[file]=@tmp.txt" \
  --form "asset[legacy_url_path]=/government/uploads/government/uploads/system/uploads/edition_organisation_image_data/file/71/Love-Your-Local-Market-960x640.jpg"

# Request the asset in integration
$ curl -v  "https://assets-origin.integration.publishing.service.gov.uk/government/uploads/government/uploads/system/uploads/edition_organisation_image_data/file/71/Love-Your-Local-Market-960x640.jpg?CJR$RANDOM" > /dev/null

> GET /government/uploads/government/uploads/system/uploads/edition_organisation_image_data/file/71/Love-Your-Local-Market-960x640.jpg?CJR19028 HTTP/1.1
> User-Agent: curl/7.35.0
> Host: assets-origin.integration.publishing.service.gov.uk
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Tue, 23 Jan 2018 14:50:24 GMT
< Content-Type: text/plain
< Content-Length: 29
< Connection: keep-alive
* Server nginx is not blacklisted
< Server: nginx
< Vary: Accept-Encoding
< Accept-Ranges: bytes
< Cache-Control: max-age=14400, public
< Content-Disposition: inline; filename="tmp.txt"
< ETag: "5a674ba5-1d"
< Last-Modified: Tue, 23 Jan 2018 14:50:13 GMT
< Strict-Transport-Security: max-age=31536000
< Vary: Accept-Encoding
< Vary: Accept-Encoding
< X-Frame-Options: SAMEORIGIN
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: GET, OPTIONS
< Access-Control-Allow-Headers: origin, authorization

# Search Kibana for CJR19028
January 23rd 2018, 14:50:24.698  -   -  asset-manager
January 23rd 2018, 14:50:24.000  -   -  static-json.event.access
January 23rd 2018, 14:50:24.000  -   -  assets-origin-json.event.access
January 23rd 2018, 14:50:24.000  -   -  asset-manager-json.event.access
chrislo commented 6 years ago

These assets are now being served by asset manager in production. I made the following request:

$ curl -v "https://assets.publishing.service.gov.uk/government/uploads/government/uploads/system/uploads/edition_organisation_image_data/file/71/Love-Your-Local-Market-960x640.jpg?CRL$RANDOM" > /dev/null

GET /government/uploads/government/uploads/system/uploads/edition_organisation_image_data/file/71/Love-Your-Local-Market-960x640.jpg?CRL104 HTTP/1.1
> Host: assets.publishing.service.gov.uk
> User-Agent: curl/7.54.0
> Accept: */*
>
  0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0< HTTP/1.1 200 OK
< Server: nginx
< Content-Type: image/jpeg
< Content-Disposition: inline; filename="Love-Your-Local-Market-960x640.jpg"
< Cache-Control: max-age=14400, public
< ETag: "50a3e496-15f47"
< Last-Modified: Wed, 14 Nov 2012 18:36:06 GMT
< X-Frame-Options: SAMEORIGIN
< Strict-Transport-Security: max-age=31536000
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: GET, OPTIONS
< Access-Control-Allow-Headers: origin, authorization
< Fastly-Backend-Name: origin
< Content-Length: 89927
< Accept-Ranges: bytes
< Date: Wed, 24 Jan 2018 11:46:26 GMT
< Via: 1.1 varnish
< Age: 0
< Connection: keep-alive
< X-Served-By: cache-lhr6348-LHR
< X-Cache: MISS
< X-Cache-Hits: 0
< X-Timer: S1516794384.368793,VS0,VE1806
<

And verified that the request was eventually served by asset manager in Kibana:

screen shot 2018-01-24 at 06 46 47
chrislo commented 6 years ago

I've moved the task to delete these assets to #405 so this issue can be closed.