ACCLAB / DABEST-python

Data Analysis with Bootstrapped ESTimation
https://acclab.github.io/DABEST-python/
Apache License 2.0
341 stars 47 forks source link

Delta delta #119

Closed LI-Yixuan closed 3 years ago

LI-Yixuan commented 3 years ago
LI-Yixuan commented 3 years ago

Some explanations of the new parameters/changes in dabest.load(): (updated after removing the sort of delta bootstraps) delta2: boolean indicator of delta-delta plots x: can be declared as a list of 2 column names when delta2 is True; the first element determines the x-axis of raw data plots and the second element determines the dot colors of raw data x1_level: a list to specify the values of the first variable inx as well as the order of the plots; if not declared, a default order is used experiment: the column name that contains the experiment labels experimen_label: a list to specify the values of experiment as well as the order of the plots; if not declared, a default order is used

Sample plots:

For unpaired data: (using the TrhCsCh data):

import numpy as np
import pandas as pd
import dabest
expresso = pd.read_csv("TrhCsCh.csv")
temp = dabest.load(data = expresso, x = ["Light", "Genotype"], y = "FeedCount", delta2 = True, experiment = "Status").mean_diff
temp.plot(fig_size=(12, 8), raw_marker_size=4)

index

demo of x1_level and experiment_label:

# the following code will generate a plot same as the one above  
dabest.load(data = expresso, x = ["Light", "Genotype"], y = "FeedCount", delta2 = True, experiment = "Status",
           experiment_label = ["Control", "Test"], x1_level = ["Red Light On","Red Light Off"]).mean_diff.plot(fig_size=(15, 10), raw_marker_size=3)

index

demo of plots for paired data

from scipy.stats import norm # Used in generation of populations.

np.random.seed(9999) # Fix the seed so the results are replicable.
# pop_size = 10000 # Size of each population.
Ns = 20 # The number of samples taken from each population

# Create samples
y = norm.rvs(loc=3, scale=0.4, size=Ns*2)

# Add experiment column
e1 = np.repeat('Control', Ns).tolist()
e2 = np.repeat('Test', Ns).tolist()
experiment = e1 + e2 

# Add a `Light` column as the first variable
light = []
for i in range(Ns):
    light.append('L1')
    light.append('L2')

# Add a `genotype` column as the second variable
g1 = np.repeat('G1', Ns/2).tolist()
g2 = np.repeat('G2', Ns/2).tolist()
g3 = np.repeat('G3', Ns).tolist()
genotype = g1 + g2 + g3

# Add an `id` column for paired data plotting.
id_col = []
for i in range(Ns):
    id_col.append(i)
    id_col.append(i)

# Combine samples and gender into a DataFrame.
df = pd.DataFrame({'ID'  : id_col,
                   'Light'     : light,
                   'Genotype'  : genotype, 
                   'Experiment': experiment,
                   'Y'  : y
                    })

dabest.load(data = df, x = ["Light", "Genotype"], y = "Y", delta2 = True, experiment = "Experiment",
           paired="sequential", id_col="ID").mean_diff.plot()

paired