Closed gwaybio closed 4 years ago
Maybe try again because this happened before and just worked IIRC the second time you tried it https://github.com/broadinstitute/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/issues/5#issuecomment-570281677
thanks for linking these errors and for reminding me that they are relatively common.
I have restarted the pipeline many times, and I manually checked the database integrity with sqlalchemy
. The same error persists. Also, I double checked the analysis in broadinstitute/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad#3 and that plate is the only one did not process. I did not perform a sufficient check in the previous analysis.
One thing that I did not do was try loading with dbplyr
. Profiles processed with cytominer
do exist in S3 indicating that the sqlite file was working at one point. Maybe the database is somehow only throwing errors with sqlalchemy
? 🤷♂️
I manually checked the database integrity with
sqlalchemy
. The same error persists.
You manually checked integrity with sqlalchemy
and it shows up as bad or good?
bad - same error persists. Checking with dbplyr
now and getting a similar error:
> list.files()
[1] "SQ00015049_augmented.csv"
[2] "SQ00015049.csv"
[3] "SQ00015049_normalized.csv"
[4] "SQ00015049_normalized_variable_selected.csv"
[5] "SQ00015049_normalized_variable_selected.gct"
[6] "SQ00015049.sqlite"
> sqlfile = "SQ00015049.sqlite"
> db <- DBI::dbConnect(RSQLite::SQLite(), sqlfile)
Warning message:
Couldn't set synchronous mode: database disk image is malformed
Use `synchronous` = NULL to turn off this warning.
Got it. Lets check the md5 https://github.com/broadinstitute/imaging-backup-scripts/issues/10
(this is a stub comment)
Ok so it might be easier to just retrieve that file again, following the instructions here. These new instructions allow you to check md5
You'd do this
echo "SQ00015049" > list_of_plates.txt
The rest should just work
Got it. Lets check the md5
md5sum of SQ00015049
in S3 is 0c03ae889932e1609bea7dfc0137c916
Ok so it might be easier to just retrieve that file again, following the instructions here. These new instructions allow you to check md5 The rest should just work
Cool, I'll try it out
"Restore": "ongoing-request=\"true\"",
Seems to be working great!
The retrieval may take several hours. Check status again in a few hours and ensure that all files are available.
Is this something I should have tmux
'd? Or will it continue to run after exiting?
Is this something I should have
tmux
'd? Or will it continue to run after exiting?
No need to tmux
🎉
(base) ubuntu@ip-10-0-9-22:~/ebs_tmp$ cat 2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad_2016_04_01_a549_48hr_batch1_SQ00015049_backend.md5
a07f32c03b6a4f9d8fa016b9216ed235 2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad_2016_04_01_a549_48hr_batch1_SQ00015049_backend.tar.gz
(base) ubuntu@ip-10-0-9-22:~/ebs_tmp$ cat 2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad_2016_04_01_a549_48hr_batch1_SQ00015049_backend.md5.local
a07f32c03b6a4f9d8fa016b9216ed235 2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad_2016_04_01_a549_48hr_batch1_SQ00015049_backend.tar.gz
Great news! It looks like the md5sum of restored SQ00015049.sqlite
is a35c28f7e96a9757d83b0f79c3130eba
In https://github.com/broadinstitute/lincs-cell-painting/issues/25#issuecomment-626082068 the corrupted SQ00015049.sqlite
md5sum was 0c03ae889932e1609bea7dfc0137c916.
This is promising!
It is currently processing! The imaging backup scripts solution seems to have worked splendidly.
Something is wrong with plate SQ00015049. We successfully processed all other plates except this one. Below is the error: