broadinstitute / imaging-backup-scripts

Scripts to backup data for the Imaging Platform
MIT License
1 stars 3 forks source link

Document process to restore data #7

Closed shntnu closed 5 years ago

shntnu commented 5 years ago

Via this PR: #3

shntnu commented 5 years ago

@hkhawar will do, by cleaning up the notes here https://github.com/broadinstitute/2016_03_14_TargetID_Wagner_Schenone_new/issues/2#issuecomment-524643531

shntnu commented 5 years ago

@hkhawar please ping @MarziehHaghighi once you are done so she can use this for the repurposing retrieval

hkhawar commented 5 years ago

@shntnu please guide me how can I add that document? Though I have shared it with Marzieh yesterday. She had initiated restoring of illum files yesterday

shntnu commented 5 years ago

@hkhawar Sure thing. Given that this is text, you can edit directly in the browser, but do so in the correct branch.

A direct path to this is via this link, but I suggest the process above so you get familiar https://github.com/broadinstitute/imaging-backup-scripts/edit/glacier_restore_instructions/glacier_restore.md

hkhawar commented 5 years ago

Great thanks

On Wed, Aug 28, 2019 at 10:57 AM Shantanu Singh notifications@github.com wrote:

@hkhawar https://github.com/hkhawar Sure thing. Given that this is text, you can edit directly in the browser, but do so in the correct branch.

  • Go to the main page of this repo,
  • switch to the branch glacier_restore_instructions,
  • then open the file glacier_restore.md and edit in place.
  • Save your edits to that branch directly (" Commit directly to the glacier_restore_instructions branch"

A direct path to this is via this link, but I suggest the process above so you get familiar

https://github.com/broadinstitute/imaging-backup-scripts/edit/glacier_restore_instructions/glacier_restore.md

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/imaging-backup-scripts/issues/7?email_source=notifications&email_token=AIUGCWLHTSCA4JHK5ZYW5R3QG2G47A5CNFSM4IRIFBGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5LNGDY#issuecomment-525783823, or mute the thread https://github.com/notifications/unsubscribe-auth/AIUGCWJTM6ABGOJIHGK5HU3QG2G47ANCNFSM4IRIFBGA .

MarziehHaghighi commented 5 years ago

@shntnu There should be a problem in glacier_restore.sh as it doesn't send request for unarchiving the images however it works perfectly on getting backend files. Here is an example for one plate that you can check:

./glacier_restore.sh --project_name ${PROJECT_NAME} --batch_id ${BATCH_ID} --plate_id SQ00015201 --get_images --check_status

I sent the request for unarchiving the tar files manually (through webpage) just for test and now if you run the command you can see that the request is sent for that one and not for md5 file.

shntnu commented 5 years ago

@MarziehHaghighi Can you paste here the stdout file for any one such case ?

hkhawar commented 5 years ago

@shntnu here is the output of stdout

Get images ... { "AcceptRanges": "bytes", "ContentType": "application/x-md5", "LastModified": "Sat, 16 Jun 2018 20:20:55 GMT", "ContentLength": 163, "VersionId": "t8cLhQFORgaKexGoU87Q2wabr9Q3mb2t", "ETag": "\"ef22bc99f867196bc4a9117e608b25ea\"", "StorageClass": "GLACIER", "Metadata": {} } { "AcceptRanges": "bytes", "ContentType": "application/x-tar", "LastModified": "Sat, 16 Jun 2018 19:22:48 GMT", "ContentLength": 161905350765, "VersionId": "ERWlwmpNgD5re2HgXKSFpQlowX_AuM9n", "ETag": "\"66b0d36cd3a3ea1bf4bb75abb87c372b-9651\"", "StorageClass": "GLACIER", "Metadata": {} }

shntnu commented 5 years ago

@hkhawar is it possible that wasn't using the current version of the script?

I don't see this line being output

hkhawar commented 5 years ago

Yup this line is some how missing from the script. I am adding it and initiate restoring process again

MarziehHaghighi commented 5 years ago

@shntnu But that's not the source of problem. Here is an example:

Get images ... { "AcceptRanges": "bytes", "ContentType": "application/x-md5", "LastModified": "Sat, 16 Jun 2018 20:20:55 GMT", "ContentLength": 163, "VersionId": "t8cLhQFORgaKexGoU87Q2wabr9Q3mb2t", "ETag": "\"ef22bc99f867196bc4a9117e608b25ea\"", "StorageClass": "GLACIER", "Metadata": {} } { "Restore": "ongoing-request=\"true\"", "AcceptRanges": "bytes", "ContentType": "application/x-tar", "LastModified": "Sat, 16 Jun 2018 19:22:48 GMT", "ContentLength": 161905350765, "VersionId": "ERWlwmpNgD5re2HgXKSFpQlowX_AuM9n", "ETag": "\"66b0d36cd3a3ea1bf4bb75abb87c372b-9651\"", "StorageClass": "GLACIER", "Metadata": {} }

First file is being requested using the script and the second file using manual retrieval. As you can see the first file doesn't have "restore" status line.

shntnu commented 5 years ago

@shntnu But that's not the source of problem. Here is an example:

@MarziehHaghighi Correct – but the fact that that line was not present was made me wonder whether the script was executed correctly

Can you point me to a file that was not restored?

MarziehHaghighi commented 5 years ago

@shntnu All the plates in the current list of plates except one of the files for the first plate (SQ00015201) which I requested manually. An example would be second plate --> SQ00015142 PROJECT_NAME=2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad BATCH_ID=2016_04_01_a549_48hr_batch1

$ ./glacier_restore.sh --project_name ${PROJECT_NAME} --batch_id ${BATCH_ID} --plate_id SQ00015142 --get_images --check_status
Get images ... { "AcceptRanges": "bytes", "ContentType": "application/x-md5", "LastModified": "Sat, 16 Jun 2018 13:52:11 GMT", "ContentLength": 163, "VersionId": "0OU7pO1F5yT9I9ubbhcJ78rxWr3dT8R.", "ETag": "\"3617bfa2a8a98d9806d9b5e5197f52d7\"", "StorageClass": "GLACIER", "Metadata": {} } { "AcceptRanges": "bytes", "ContentType": "application/x-tar", "LastModified": "Sat, 16 Jun 2018 12:50:51 GMT", "ContentLength": 171520920772, "VersionId": "YRZvkX9yGC09vHiD_7xYtr_yEDwTdBdI", "ETag": "\"1c5dace003332427fbbb566729e443a3-5112\"", "StorageClass": "GLACIER", "Metadata": {} }

shntnu commented 5 years ago

$ ./glacier_restore.sh --project_name ${PROJECT_NAME} --batch_id ${BATCH_ID} --plate_id SQ00015142 --get_images --check_status

Have PROJECT_NAME and BATCH_ID been defined? do this echo ./glacier_restore.sh --project_name ${PROJECT_NAME} --batch_id ${BATCH_ID} --plate_id SQ00015142 --get_images --check_status – what's the output?

MarziehHaghighi commented 5 years ago

@shntnu Yes, they are defined. I can easily send the request for the backend so there should be a difference in requesting the images vs backend. Although, in the glacier_restore.sh script everything looks correct and similar for both except the file names. ./glacier_restore.sh --project_name 2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad --batch_id 2016_04_01_a549_48hr_batch1 --plate_id SQ00015142 --get_images --check_status

shntnu commented 5 years ago

@MarziehHaghighi Here's what I get when I tried out restore (it seems to work). Is what you get different? You'd need to try out on a different plate to verify.

First, check status (there's no ongoing request)

$ ./glacier_restore.sh --project_name 2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad --batch_id 2016_04_01_a549_48hr_batch1 --plate_id SQ00015142 --get_images --check_status
Get images ...
{
    "AcceptRanges": "bytes",
    "ContentType": "application/x-md5",
    "LastModified": "Sat, 16 Jun 2018 13:52:11 GMT",
    "ContentLength": 163,
    "VersionId": "0OU7pO1F5yT9I9ubbhcJ78rxWr3dT8R.",
    "ETag": "\"3617bfa2a8a98d9806d9b5e5197f52d7\"",
    "StorageClass": "GLACIER",
    "Metadata": {}
}
{
    "AcceptRanges": "bytes",
    "ContentType": "application/x-tar",
    "LastModified": "Sat, 16 Jun 2018 12:50:51 GMT",
    "ContentLength": 171520920772,
    "VersionId": "YRZvkX9yGC09vHiD_7xYtr_yEDwTdBdI",
    "ETag": "\"1c5dace003332427fbbb566729e443a3-5112\"",
    "StorageClass": "GLACIER",
    "Metadata": {}
}
Download:s3://imaging-platform-cold/imaging_analysis/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/plates/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad_2016_04_01_a549_48hr_batch1_SQ00015142_images_illum_analysis.tar.gz

Then run restore (and ongoing-request appears)

$ ./glacier_restore.sh --project_name 2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad --batch_id 2016_04_01_a549_48hr_batch1 --plate_id SQ00015142 --get_images
Get images ...
{
    "Restore": "ongoing-request=\"true\"",
    "AcceptRanges": "bytes",
    "ContentType": "application/x-md5",
    "LastModified": "Sat, 16 Jun 2018 13:52:11 GMT",
    "ContentLength": 163,
    "VersionId": "0OU7pO1F5yT9I9ubbhcJ78rxWr3dT8R.",
    "ETag": "\"3617bfa2a8a98d9806d9b5e5197f52d7\"",
    "StorageClass": "GLACIER",
    "Metadata": {}
}
{
    "Restore": "ongoing-request=\"true\"",
    "AcceptRanges": "bytes",
    "ContentType": "application/x-tar",
    "LastModified": "Sat, 16 Jun 2018 12:50:51 GMT",
    "ContentLength": 171520920772,
    "VersionId": "YRZvkX9yGC09vHiD_7xYtr_yEDwTdBdI",
    "ETag": "\"1c5dace003332427fbbb566729e443a3-5112\"",
    "StorageClass": "GLACIER",
    "Metadata": {}
}
Download:s3://imaging-platform-cold/imaging_analysis/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/plates/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad_2016_04_01_a549_48hr_batch1_SQ00015142_images_illum_analysis.tar.gz

Then check status again (this verifies that there is an ongoing request)

$ ./glacier_restore.sh --project_name 2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad --batch_id 2016_04_01_a549_48hr_batch1 --plate_id SQ00015142 --get_images --check_status
Get images ...
{
    "Restore": "ongoing-request=\"true\"",
    "AcceptRanges": "bytes",
    "ContentType": "application/x-md5",
    "LastModified": "Sat, 16 Jun 2018 13:52:11 GMT",
    "ContentLength": 163,
    "VersionId": "0OU7pO1F5yT9I9ubbhcJ78rxWr3dT8R.",
    "ETag": "\"3617bfa2a8a98d9806d9b5e5197f52d7\"",
    "StorageClass": "GLACIER",
    "Metadata": {}
}
{
    "Restore": "ongoing-request=\"true\"",
    "AcceptRanges": "bytes",
    "ContentType": "application/x-tar",
    "LastModified": "Sat, 16 Jun 2018 12:50:51 GMT",
    "ContentLength": 171520920772,
    "VersionId": "YRZvkX9yGC09vHiD_7xYtr_yEDwTdBdI",
    "ETag": "\"1c5dace003332427fbbb566729e443a3-5112\"",
    "StorageClass": "GLACIER",
    "Metadata": {}
}
Download:s3://imaging-platform-cold/imaging_analysis/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/plates/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad_2016_04_01_a549_48hr_batch1_SQ00015142_images_illum_analysis.tar.gz
MarziehHaghighi commented 5 years ago

@shntnu Great! It worked! So the conclusion is that we have to remove --get_status for sending a request? @hkhawar We may update the instruction file accordingly.

shntnu commented 5 years ago

we have to remove --get_status for sending a request

Yes

hkhawar commented 5 years ago

@MarziehHaghighi following line was missing from glacier_restore.sh which I corrected yesterday. echo Download:s3://${cold_bucket}/${tarball_1} Yes for sending a request for restoring files --get_status should be removed. Though I have said you verbally but it should be added in a document

MarziehHaghighi commented 5 years ago

@hkhawar You may remove it from step 4 and add the last line below.

4: This step will first unlock backend files to later available for download from Glacier

parallel \ --results restore \ -a list_of_plates.txt \ ./glacier_restore.sh \ --project_name ${PROJECT_NAME} \ --batch_id ${BATCH_ID} \ --plate_id {1}\ --get_backend \ --check_status (Need to be removed)

You can check the status of the request by adding ( --check_status) to the above command.

hkhawar commented 5 years ago

Sure

shntnu commented 5 years ago

@hkhawar Please see the edited version of the instructions here https://github.com/broadinstitute/imaging-backup-scripts/blob/master/glacier_restore.md and note the details.

Once you are done, go ahead and close this issue