Closed chrisroos closed 6 years ago
I've just generated more up-to-date figures as follows:
$ aws s3 ls s3://govuk-assets-integration > integration-assets.txt
$ wc -l integration-assets.txt
65087 integration-assets.txt
irb> Asset.count
=> 63649
irb> Asset.unscoped.count
=> 65529
irb> Asset.where(:deleted_at.ne => nil).count
=> 1880
$ find /data/uploads/asset-manager/assets/ -type f | wc -l
65527
I believe I have identified two reasons for the discrepancies. Here are some quick notes:
There are approx 1579 assets in the database which have no S3 object matching their uuid
. All of these assets (and a couple of hundred others) are marked as deleted. We believe these were already marked as deleted when we did the initial upload to S3 and therefore were not uploaded. We should upload the files to S3 for these deleted assets.
There are approx 1137 objects on S3 whose uuid
does not match any asset in the database. All of these S3 objects have a key of length 24 (e.g. 58cba21ee5274a16e8000030
) rather than 36 (e.g. 00172d97-73b1-42dc-8c3e-7b90083f497b
). We believe the former are a remnant from when we used the database ID vs a separate UUID as the S3 key. We think they can safely be deleted.
The above seems to add up, because 65087 (no. of S3 objects) + 1579 - 1137 = 65529 (no. of assets in db).
So I've created the following issues to fix these problems:
Given the above, I'm now happy to close this issue.
I noticed some discrepancies in the number of assets while checking whether the overnight sync of assets worked (as part of issue #145).
We should work out why the number of files on S3 doesn't match the number of files on NFS.
Assets on S3
Assets in the database on integration
Assets on NFS in integration