broadinstitute / CP257-HeLa-WG

BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Weld errored at image-qc #2

Open gwaybio opened 3 years ago

gwaybio commented 3 years ago

CP257 errored as described in broadinstitute/pooled-cell-painting-profiling-recipe#73

All sites complete.
Summarizing 4684 sites in batch: 20210422_6W_CP257.
/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/plotnine/facets/facet.py:399: PlotnineWarning: If you need more space for the y-axis tick text use ... + theme(subplots_adjust={'hspace': 0.25}). Choose an appropriate value for 'hspace'
There are a total of 46371251 cells in 20210422_6W_CP257
/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/plotnine/layer.py:401: PlotnineWarning: geom_text : Removed 360 rows containing missing values.
/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/plotnine/layer.py:401: PlotnineWarning: geom_text : Removed 360 rows containing missing values.
/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/plotnine/layer.py:401: PlotnineWarning: geom_text : Removed 360 rows containing missing values.
/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/plotnine/layer.py:401: PlotnineWarning: geom_text : Removed 360 rows containing missing values.
/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/plotnine/layer.py:401: PlotnineWarning: geom_text : Removed 360 rows containing missing values.
/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/plotnine/layer.py:401: PlotnineWarning: geom_text : Removed 360 rows containing missing values.
/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/plotnine/layer.py:401: PlotnineWarning: geom_text : Removed 360 rows containing missing values.
/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/plotnine/layer.py:401: PlotnineWarning: geom_text : Removed 360 rows containing missing values.
/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/plotnine/layer.py:401: PlotnineWarning: geom_text : Removed 360 rows containing missing values.
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3081, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'level_3'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "recipe/0.preprocess-sites/4.image-and-segmentation-qc.py", line 511, in <module>
    cp_sat_df[["cat", "type", "Ch"]] = cp_sat_df["level_3"].str.split(
  File "/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/pandas/core/frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3083, in get_loc
    raise KeyError(key) from err
KeyError: 'level_3'
Building single file for dataset ALLBATCHES___ALLPLATES___ALLWELLS; combining single cells from site: CP257A-Well2-59...

cc @ErinWeisbart

looks like this error happens pretty deep into the script (4.image-and-segmentation-qc.py", line 511) so you will likely retain most of your QC figures, but not anything beyond line 511.

p.s. writing these issues on the weekend since the weld just failed (see #1 ) so as to not forget on Monday!

ErinWeisbart commented 3 years ago

When you push what has been processed I can try and replicate this locally to see if I can fix it?

gwaybio commented 3 years ago

Sounds good. I'll tag you once it's pushed

gwaybio commented 3 years ago

@ErinWeisbart #3 adds the image_metadata.tsv file - I think this is all you need to address this? LMK if you need anything else

ErinWeisbart commented 3 years ago

I don't know what's going on as a local test runs for me.

To test locally I set the variables

input_image_file = "data/0.site-qc/20210422_6W_CP257/data/image_metadata.tsv"
intensity_col_prefix = "ImageQuality_StdIntensity_"
saturated_col_prefix = "ImageQuality_PercentMaximal_"
platelist = ["CP257A", "CP257B"]
sites_per_image_grid_side = 10
image_cols = {'well': "Metadata_Well", 'site': "Metadata_Site", 'plate': "Metadata_Plate"}
barcoding_cycles=12

I then run lines 69-87 to make the loc_df and lines 257-265 to load the image file and add loc_df to it (we don't use the cols coming from loc_df for the saturation plots but it's the only time that image_df is modified after loading so I included it).

Then I run lines 555 to the end and it works and my plots save (I simplify the file save bit at the end to the following, but that shouldn't matter).

        output_file = pathlib.Path(f"bc_saturation_{well}_{plate}.png"
        )
        bc_saturation_gg.save(
            output_file,
            dpi=300,
            width=5,
            height=(barcoding_cycles + 2),
            verbose=False,
        )
ErinWeisbart commented 3 years ago

Taking a stab at what's going on, since it's a KeyError: 'level_3', that means that the line before bc_sat_df = bc_sat_df.set_index(image_meta_col_list).stack().reset_index() isn't working as expected since this is where the column name level_3 comes from. image_meta_col_list is made from image_cols at 259 but I don't see anywhere that image_meta_col_list is modified after creation and before my test.