Closed jtherrmann closed 9 months ago
I noticed that its-live-open
contained extra prefixes, from when we were attempting to sync the entirety of its-live-data
to its-live-open
before we had narrowed down the list of prefixes that should be transferred. I removed the extra prefixes:
aws --profile opendata-its-live s3 rm --recursive --only-show-errors s3://its-live-open/L7_PV_fix/
aws --profile opendata-its-live s3 rm --recursive --only-show-errors s3://its-live-open/NSIDC/
aws --profile opendata-its-live s3 rm --recursive --only-show-errors s3://its-live-open/Test/
aws --profile opendata-its-live s3 rm --recursive --only-show-errors s3://its-live-open/catalog_geojson_latest/
aws --profile opendata-its-live s3 rm --recursive --only-show-errors s3://its-live-open/catalog_geojson_original/
I then confirmed that its-live-open
contains only the expected prefixes:
$ aws s3 ls s3://its-live-open/
PRE autorift_parameters/
PRE catalog_geojson/
PRE composites/
PRE datacubes/
PRE mosaics/
PRE rgb_mosaics/
PRE vel_web_tiles/
PRE velocity_image_pair/
$ aws s3 ls s3://its-live-open/velocity_image_pair/
PRE landsatOLI/
PRE sentinel1/
PRE sentinel2/
I created a list of the objects we should expect to see in its-live-open
based on the list of prefixes under "(2) user data" at https://github.com/ASFHyP3/OpenData/issues/10#issuecomment-1850890480:
grep -e '^autorift_parameters/' -e '^catalog_geojson/' -e '^composites/' -e '^datacubes/' -e '^mosaics/' -e '^rgb_mosaics/' -e '^vel_web_tiles/' -e '^velocity_image_pair/landsatOLI/' -e '^velocity_image_pair/sentinel1/' -e '^velocity_image_pair/sentinel2/' data_keys_sorted_20231218T0100.txt > expected_open_keys_from_data_keys.txt
I also created a list of the actual contents of its-live-open
, filtering out the extra prefixes that were deleted (see above):
grep -v -e '^L7_PV_fix/' -e '^NSIDC/' -e '^Test/' -e '^catalog_geojson_latest/' -e '^catalog_geojson_original/' open_keys_sorted_20231217T0100.txt > open_keys_filtered.txt
I confirmed that the expected contents match the actual contents:
$ du expected_open_keys_from_data_keys.txt open_keys_filtered.txt
52164724 expected_open_keys_from_data_keys.txt
52164724 open_keys_filtered.txt
$ sha256sum expected_open_keys_from_data_keys.txt open_keys_filtered.txt
1fcc1c297301eaface0c2c93db9e1879ae828ee99d0165ca57374591c2d5ce08 expected_open_keys_from_data_keys.txt
1fcc1c297301eaface0c2c93db9e1879ae828ee99d0165ca57374591c2d5ce08 open_keys_filtered.txt
I ran https://github.com/ASFHyP3/OpenData/blob/batch-transfer/batch-transfer/check_sizes.py to calculate the total size of the s3://its-live-open
contents, as well as the expected total size (from the s3://its-live-data
contents that were transferred into its-live-open
). The two values exactly match:
Final output line from python check_sizes.py expected-open
:
Total size: 97618572102677
Final output line from python check_sizes.py actual-open
:
Total size: 97618572102677
We moved some stuff around today, per https://github.com/ASFHyP3/OpenData/issues/10#issuecomment-1850890480. Since the checklist of prefixes in that comment is getting a bit complex, I copied it into a text editor and re-grouped all of the prefixes into project prefixes, user prefixes, and prefixes to be deleted.
I then listed the its-live-project
and its-live-open
buckets and confirmed that the prefixes that are currently present in each bucket (as of today, 2023-12-21) exactly match the prefix lists that I generated based on the checklist from https://github.com/ASFHyP3/OpenData/issues/10#issuecomment-1850890480.
This gives me greater confidence that we've transferred everything correctly. We should get someone from ITS_LIVE
to approve these lists of prefixes, and then do a final verification of keys and total bucket size (using S3 inventory reports), similar to what we did above.
Here are the lists:
Contents of its-live-project
bucket:
L7_PV_fix/
Test/
elevation/
isce_autoRIFT/
month-data-logs/
s3-inventory/
test/
test_datacubes/
test_datacubes/forAlex/
test_datacubes/mosaics/
test_datacubes/s1_correction/
test_datacubes/validate_v2_granule_crop/
velocity_image_pair/
velocity_image_pair/landsatOLI-latest/
velocity_image_pair/sentinel1-backup/
velocity_image_pair/sentinel1-corrected-8granules/
velocity_image_pair/sentinel1-corrected/
velocity_image_pair/sentinel1-latest/
velocity_image_pair/sentinel2-latest/
Contents of its-live-open
bucket:
autorift_parameters/
catalog_geojson/
composites/
datacubes/
documentation/
height_change/
ice_masks/
mosaics/
qgis_project/
rgb_mosaics/
vel_web_tiles/
velocity_image_pair/
velocity_image_pair/landsatOLI/
velocity_image_pair/sentinel1/
velocity_image_pair/sentinel2/
velocity_mosaic/
Will be deleted, along with anything else in the its-live-data
bucket that was not transferred to one of the other two buckets:
NSIDC/
ice_shelf/
TODO:
ITS_LIVE
projectits-live-project
and its-live-open
bucket contents (the lists above can be used to filter the its-live-data
inventory report; s3-inventory/
will need to be filtered out of the its-live-project
report, unless it's already excluded from the report?)its-live-data
and NOT in either of the other two buckets (these prefixes will be deleted)Updated https://github.com/ASFHyP3/OpenData/blob/batch-transfer/batch-transfer/check_sizes.py and re-ran it to compare the total size of the its-live-open
bucket against the transferred prefixes from the its-live-data
bucket using the latest inventory reports. Output shows an exact match, with a total size of 98131541328013
for both buckets, for the relevant prefixes.
@asjohnston-asf Provided text files at
s3://asj-dev/opendata/
containing sorted project keys from theits-live-data
,its-live-open
, andits-live-project
bucket inventory reports.