datarail / DrugResponse

Analysis of drug response based on cell staining
MIT License
2 stars 6 forks source link

A few comments on the cell gating with Python #9

Open Farnazmdi opened 1 year ago

Farnazmdi commented 1 year ago

Hi @NicholasClark , I deeply apologize that it escaped my mind to write this issue and just remembered that i had not done it now. Here are the few points we discussed during the meeting:

Here is the function I wrote to center the EdU intensity to 3.5 in log-scale.

import os
import numpy as np
import pandas as pd
import itertools as it
def center_data(path_dir: str):
    """
    This function filters based on 1 percentile of the EdU intensity and centers the EdU intensity around 3.5 in the log10 scale for the internal 2022 dataset, and saves it as a txt file in the examples's directory.
    example path_dir = '../cell_cycle_data/220713_DyeDrop_CellCycle'
    example dname = '2022_HCC1500_24'
    """
    col, ld = [], []
    for file in os.listdir(path_dir):
        d = pd.read_csv(path_dir + '/' + file, sep='\t')
        col.append(list(np.log10(np.asarray(d['SingleNuc - Intensity A488 Mean']))))
    flat_col = list(it.chain(*col))
    min_perc = np.percentile(flat_col, 1)
    max_perc = np.percentile(flat_col, 99)
    mask = np.logical_and((flat_col > min_perc), (flat_col < max_perc))
    mn = 3.5  - np.mean(np.array(flat_col)[mask])

    for file in os.listdir(path_dir):
        d = pd.read_csv(path_dir + '/' + file, sep="\t", index_col=0)
        col = np.log10(np.asarray(d['SingleNuc - Intensity A488 Mean']))
        rescaled_col = col + mn
        d['SingleNuc - Intensity A488 Mean'] = np.power(10, rescaled_col)

        d.to_csv('mda_filtered/' + file, sep='\t', index=False)
NicholasClark commented 1 year ago

Farnaz, no worries, I am just returning from vacation anyway. I will start working on these changes soon. I'll message you if I need clarification on anything.

-Nick