Closed adborden closed 2 years ago
Currently in the production bucket under /datagov/wordpress: Total Objects: 7407 Total Size: 856.6 MiB
We should make sure any usage of the current bucket in the static site is migrated to the new s3 bucket...
Chatted with @robert-bryson about this. Ideally we'd keep assets in the repo because it's not easy for editors to upload directly to an S3 bucket in cloud.gov. We're going to run some build tests to make sure that adding 856MB won't slow down the build too much.
OR we can keep only the old assets in S3 and newer assets in the repo.
Groomed, thanks @jbrown-xentity
I moved this to In Progress
because Aaron started copying out the entire FCS S3 bucket, and will soon have the WordPress assets available in another bucket accessible to the static site. At that point I think we would just need to search and replace the bucket name in the crawled pages to be done here... Is that right? (Leaving "should we get rid of the bucket and just include the assets in the static site" for another time.)
production S3 copy finished. It took 24 hours.
(venv) ubuntu@wordpressweb1p (production) ~/datagov-s3-migrate$ time python migrate.py --use-ec2
...
real 1444m43.290s
user 128m45.062s
sys 70m9.456s
Using creds from cf service-key fcs-lifeboat s3-migration
, I copied the files to a local branch in GSA/datagov-website with aws s3 cp s3://${BUCKET_NAME}/datagov/wordpress/ . --recursive
.
The files all together are only about 1gb, but 217mb of that are one large MOV file which throws an error:
remote: error: File www.data.gov/datagov/wordpress/2016/09/Scott_Smith_Message_Open_Data_Innovation_Summit.mov is 217.06 MB; this exceeds GitHub's file size limit of 100.00 MB
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com.
This will have to be addressed but for now, as a proof of concept, I just deleted it.
I also changed references and links to the old s3 bucket to point to the files now in datagov/wordpress
, with the hope that the build on Federalist will work well with those links. Pushed to trigger test build.
fake edit: github sure is thinking a lot about this push
, but that might be a tomorrow problem.
real edit: push
worked now for some reason.
@robert-bryson I checked inventory files on the fcs-lifeboat. The file count is much less than the count on original fcs s3. Doing some debugging now.
@robert-bryson I checked inventory files on the fcs-lifeboat. The file count is much less than the count on original fcs s3. Doing some debugging now.
The count for just ${BUCKET_NAME}/datagov/wordpress/
is off? I just grabbed that subset, I know the entire bucket is much larger. I will look into it in the morning.
Last operation was done on an unknown bucket, not . We have the connection details on the unknown bucket but not its service name. At this point we should find out the service name of the unknown bucket and rename it to fcs-lifeboat
fcs-lifeboat
, or do a aws s3 sync
from it to fcs-lifeboat
. aws s3 sync
should be faster than re-run the s3-migrate
script, which will take 1444m (24h).
[UPDATE] located the unknown bucket. it is fcs-lifeboat
in prod
space. We renamed staging:fcs-lifeboat
to fcs-lifeboat-staging
, then shared prod:fcs-lifeboat
to staging.
The Federalist build failed with a disk quota error:
2021-12-14 16:20:47 INFO [build-jekyll] /usr/local/rvm/rubies/ruby-2.7.4/lib/ruby/2.7.0/fileutils.rb:1415:in `initialize': Disk quota exceeded @ rb_sysopen - /tmp/work/site_repo/_site/datagov/wordpress/2014/04/bkodhq8caaaoumn.png (Errno::EDQUOT)
Looks like we might not be able to have all the assets local to Federalist. Will look into it further, but likely will go with the other s3 bucket route.
The kind folks at #federalist-support were able to increase our disk quota. We're back on the Federalist 🚂!
and and a new build (site) that I didn't quite do the relatively links correctly.
I wrestled ruby
and gem
and jekyll
for an embarrassing amount of time today. Once they finally yielded and I was able to have bundle exec jekyll serve
run correctly to do local debugging of why certain assets were 404-ing, it became obvious that they were 404-ing because they weren't there.
aws s3 sync
show 541 missing files (out of 7,407 total):
$ aws s3 sync s3://{$BUCKET}/datagov/wordpress/ . --dryrun | wc -l
541
A quick sync
and new build, but images that should load don't due to Federalist prepending /preview/gsa/datagov-website/feature/3541-migrate-wordpress-s3-assets/
to the urls.
and a new build and things still aren't working.
Federalist is doing something weird with the prefix. I dunno. It works locally.
I was missing cases like srcset="datagov/wordpress/2019/04/IENC-example-300x243.jpg 300w, /datagov/wordpress/2019/04/IENC-example-768x621.jpg 768 w, /datagov/wordpress/2019/04/IENC-example.jpg 806w"
, which results in the smallest images loading, but not any of the larger responsive images:
but not
.
another new build and huzzah! Looks like it is working correctly.
Successful Federalist build on main branch and demo site. 🎉 🎉
A demo example with an image from the front page:
Before this work, the WordPress assets were being hosted in an AWS S3 bucket.
After this work, the assets are included in the Federalist repository and build.
User Story
In order to prevent images and other static assets from breaking once FCS resources are deleted, data.gov team wants the WordPress S3 assets migrated out of FCS before the buckets are deleted.
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
Background
[Any helpful contextual notes or links to artifacts/evidence, if needed]
Security Considerations (required)
If assets are required to be stored in S3, then the S3 bucket should be provisioned with cloud.gov which satisfies our compliance requirements.
Sketch
[Notes or a checklist reflecting our understanding of the selected approach]