WayScience / pyroptosis_signature_image_profiling

This repository contains the pipelines used to analyze the data sets from the Interstellar collaboration.
Creative Commons Zero v1.0 Universal
2 stars 3 forks source link

Power surge causing CellProfiler to stop mid-run #25

Open jenna-tomkinson opened 1 year ago

jenna-tomkinson commented 1 year ago

Due to a power surge/outage during the SH-SY5Y run and the computer not plugged in properly to the UPS (uninterrupted power supply), the SQLite file is incomplete.

Since CellProfiler does not have the ability to pick up a run where it left off, that means to avoid spending more computational power and time rerunning the same images, we can split the LoadData CSV for this cell type into two parts:

a. Part A, where these are the images that were run b. Part B, where these are the images that still need to be processed

We know the image set that CellProfiler stopped at using the log file from the first run. Since the LoadData CSV has the same number of rows as the images sets, we can split the data frame by row index as seen below:

def split_loaddata_csv_by_row(
    path_to_loadata: pathlib.Path,
    output_dir: pathlib.Path,
    row_index_val: int, 
    first_csv_name: str,
    second_csv_name: str,
):
    """
    This function will split a LoadData CSV in half (two groups) based on columns into two different CSVs.
    This is can used for when you have different cell types on the same plate.

    Parameters
    ----------
    path_to_loadata : pathlib.Path
        path to the LoadData CSV with IC functions to be edited
    output_dir : pathlib.Path
        path to directory where new LoadData CSVs will be saved to
    row_index_val : int
        index value to separate 
    first_csv_name : str
        name of the LoadData CSV for the first group of the plate (name should include loaddata and state
        that there are IC functions)
        Example: loaddata_PBMC_with_ic
    second_csv_name : str
        name of the LoadData CSV for the second group of the plate (see example above)
    """
    # load in LoadData CSV as pandas dataframe
    loaddata_df = pd.read_csv(path_to_loadata)

    # splitting dataframe by row index
    df_1 = loaddata_df.iloc[:row_index_val,:]
    df_2 = loaddata_df.iloc[row_index_val:,:]

    # save new LoadData CSVs based on given name
    df_1.to_csv(pathlib.Path(f"{output_dir}/{first_csv_name}.csv"), index=False)
    df_2.to_csv(pathlib.Path(f"{output_dir}/{second_csv_name}.csv"), index=False)
    print(f"{path_to_loadata.name} has been split into {first_csv_name}.csv and {second_csv_name}.csv!")

This allows for CellProfiler to start back where it left off.

jenna-tomkinson commented 1 year ago

The log file was accidentally cleared due to rerunning the processes. I have documentation regarding where the log file ended at MeasureColocalization during the analysis of image set 1218 and I can see that the SQLite file gets to the Well H23 FOV1 which is the same row index as 1217.