broadinstitute / lincs-cell-painting

Processed Cell Painting Data for the LINCS Drug Repurposing Project
BSD 3-Clause "New" or "Revised" License
25 stars 13 forks source link

Plate SQ00015049 is not processing #25

Closed gwaybio closed 4 years ago

gwaybio commented 4 years ago

Something is wrong with plate SQ00015049. We successfully processed all other plates except this one. Below is the error:

Now processing... Plate: SQ00015049
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/lincs/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
    cursor, statement, parameters, context
  File "/home/ubuntu/miniconda3/envs/lincs/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
    cursor.execute(statement, parameters)
sqlite3.DatabaseError: database disk image is malformed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "profile.py", line 56, in <module>
    ap = AggregateProfiles(sql_file=sql_file, strata=strata, operation=aggregate_method)
  File "/home/ubuntu/miniconda3/envs/lincs/lib/python3.7/site-packages/pycytominer/aggregate.py", line 86, in __init__
    self.load_image()
  File "/home/ubuntu/miniconda3/envs/lincs/lib/python3.7/site-packages/pycytominer/aggregate.py", line 118, in load_image
    self.image_df = pd.read_sql(sql=image_query, con=self.conn)
  File "/home/ubuntu/miniconda3/envs/lincs/lib/python3.7/site-packages/pandas/io/sql.py", line 438, in read_sql
    chunksize=chunksize,
  File "/home/ubuntu/miniconda3/envs/lincs/lib/python3.7/site-packages/pandas/io/sql.py", line 1218, in read_query
    result = self.execute(*args)
  File "/home/ubuntu/miniconda3/envs/lincs/lib/python3.7/site-packages/pandas/io/sql.py", line 1087, in execute
    return self.connectable.execute(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/lincs/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 976, in execute
    return self._execute_text(object_, multiparams, params)
  File "/home/ubuntu/miniconda3/envs/lincs/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1151, in _execute_text
    parameters,
  File "/home/ubuntu/miniconda3/envs/lincs/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1288, in _execute_context
    e, statement, parameters, cursor, context
  File "/home/ubuntu/miniconda3/envs/lincs/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1482, in _handle_dbapi_exception
    sqlalchemy_exception, with_traceback=exc_info[2], from_=e
  File "/home/ubuntu/miniconda3/envs/lincs/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
    raise exception
  File "/home/ubuntu/miniconda3/envs/lincs/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
    cursor, statement, parameters, context
  File "/home/ubuntu/miniconda3/envs/lincs/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.DatabaseError: (sqlite3.DatabaseError) database disk image is malformed
[SQL: select TableNumber, ImageNumber, Image_Metadata_Plate, Image_Metadata_Well from image]
(Background on this error at: http://sqlalche.me/e/4xp6)
shntnu commented 4 years ago

Maybe try again because this happened before and just worked IIRC the second time you tried it https://github.com/broadinstitute/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/issues/5#issuecomment-570281677

gwaybio commented 4 years ago

thanks for linking these errors and for reminding me that they are relatively common.

I have restarted the pipeline many times, and I manually checked the database integrity with sqlalchemy. The same error persists. Also, I double checked the analysis in broadinstitute/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad#3 and that plate is the only one did not process. I did not perform a sufficient check in the previous analysis.

One thing that I did not do was try loading with dbplyr. Profiles processed with cytominer do exist in S3 indicating that the sqlite file was working at one point. Maybe the database is somehow only throwing errors with sqlalchemy? 🤷‍♂️

shntnu commented 4 years ago

I manually checked the database integrity with sqlalchemy. The same error persists.

You manually checked integrity with sqlalchemy and it shows up as bad or good?

gwaybio commented 4 years ago

bad - same error persists. Checking with dbplyr now and getting a similar error:

> list.files()
[1] "SQ00015049_augmented.csv"
[2] "SQ00015049.csv"
[3] "SQ00015049_normalized.csv"
[4] "SQ00015049_normalized_variable_selected.csv"
[5] "SQ00015049_normalized_variable_selected.gct"
[6] "SQ00015049.sqlite"
> sqlfile = "SQ00015049.sqlite"
> db <- DBI::dbConnect(RSQLite::SQLite(), sqlfile)
Warning message:
Couldn't set synchronous mode: database disk image is malformed
Use `synchronous` = NULL to turn off this warning.
shntnu commented 4 years ago

Got it. Lets check the md5 https://github.com/broadinstitute/imaging-backup-scripts/issues/10

(this is a stub comment)

shntnu commented 4 years ago

Ok so it might be easier to just retrieve that file again, following the instructions here. These new instructions allow you to check md5

You'd do this

echo "SQ00015049" > list_of_plates.txt

The rest should just work

gwaybio commented 4 years ago

Got it. Lets check the md5

md5sum of SQ00015049 in S3 is 0c03ae889932e1609bea7dfc0137c916

Ok so it might be easier to just retrieve that file again, following the instructions here. These new instructions allow you to check md5 The rest should just work

Cool, I'll try it out

gwaybio commented 4 years ago
"Restore": "ongoing-request=\"true\"",

Seems to be working great!

The retrieval may take several hours. Check status again in a few hours and ensure that all files are available.

Is this something I should have tmux'd? Or will it continue to run after exiting?

shntnu commented 4 years ago

Is this something I should have tmux'd? Or will it continue to run after exiting?

No need to tmux

gwaybio commented 4 years ago

🎉

(base) ubuntu@ip-10-0-9-22:~/ebs_tmp$ cat 2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad_2016_04_01_a549_48hr_batch1_SQ00015049_backend.md5
a07f32c03b6a4f9d8fa016b9216ed235  2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad_2016_04_01_a549_48hr_batch1_SQ00015049_backend.tar.gz
(base) ubuntu@ip-10-0-9-22:~/ebs_tmp$ cat 2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad_2016_04_01_a549_48hr_batch1_SQ00015049_backend.md5.local
a07f32c03b6a4f9d8fa016b9216ed235  2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad_2016_04_01_a549_48hr_batch1_SQ00015049_backend.tar.gz
gwaybio commented 4 years ago

Great news! It looks like the md5sum of restored SQ00015049.sqlite is a35c28f7e96a9757d83b0f79c3130eba

In https://github.com/broadinstitute/lincs-cell-painting/issues/25#issuecomment-626082068 the corrupted SQ00015049.sqlite md5sum was 0c03ae889932e1609bea7dfc0137c916.

This is promising!

gwaybio commented 4 years ago

It is currently processing! The imaging backup scripts solution seems to have worked splendidly.