Feature: Implement Statistical Analyses for Simulation Insights

This issue focuses on implementing statistical analyses for deeper insights into the simulation results. The goal is to analyze agent behavior, rewards, population dynamics, and other metrics to uncover patterns and optimize the simulation. Each analysis will be modular and reusable for further simulations.

Tasks

Action Type Distribution Analysis
- Perform correlation analysis between action frequencies, agent lifespans, and rewards.
- Conduct a Chi-Square test to determine if certain actions are disproportionately associated with successful agents.
- Function: analyze_action_distribution(actions_df)
Reward Efficiency Analysis
- Calculate the average reward per action type and normalize by action frequency.
- Compare efficiency across different groups of agents.
- Function: calculate_reward_efficiency(actions_df)
Health and Resource Dynamics
- Perform cross-correlation analysis between health and resource levels to understand dependencies.
- Calculate autocorrelation to identify recurring patterns in health levels.
- Function: analyze_health_resource_dynamics(health_series, resource_series)
Lifespan Analysis
- Implement Kaplan-Meier survival analysis to estimate survival probabilities.
- Identify factors influencing lifespan variability.
- Function: analyze_lifespan_distribution(lifespan_series)
Population Dynamics Analysis
- Decompose population dynamics into trend, seasonal, and residual components.
- Analyze fluctuations and their impact on the system.
- Function: decompose_population_dynamics(population_series)
Reward Inequality
- Calculate the Gini coefficient to measure inequality in rewards among agents.
- Visualize the Lorenz curve for reward distribution.
- Function: calculate_gini_coefficient(rewards)
Behavioral Clustering
- Use clustering algorithms to identify common behavioral strategies among agents.
- Reduce data dimensionality using PCA for better visualization.
- Function: cluster_agent_behaviors(actions_df)
Health vs. Age Interaction
- Fit a nonlinear regression model to determine the relationship between health and age.
- Identify any outliers or anomalies in health trends.
- Function: fit_health_age_model(age_series, health_series)
Strategy Evolution
- Create a Markov chain of action transitions to analyze strategy evolution.
- Calculate transition probabilities between actions.
- Function: analyze_strategy_transitions(actions_df)
Cohort-Based Analysis
- Divide agents into cohorts based on initial conditions.
- Compare metrics like rewards, lifespan, and health between cohorts using t-tests or ANOVA.
- Function: compare_cohorts(df, cohort_column, metric_column)

Acceptance Criteria

[ ] All statistical functions are implemented with proper documentation.
[ ] Each function is unit-tested with mock data.
[ ] Analyses produce visualizations or data summaries for easy interpretation.
[ ] Results are stored in a reusable format (e.g., CSV or JSON).
[ ] Functions are modular and can be integrated into the simulation pipeline.

Additional Notes

Use Pandas and NumPy for data manipulation.
Use Matplotlib or Seaborn for visualizations where applicable.
Leverage SciPy, lifelines, and scikit-learn for advanced statistical computations.
Consider optimizing functions for large datasets.

1. Action Type Distribution Analysis

import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency

def analyze_action_distribution(actions_df):
    """
    Perform correlation and Chi-Square analysis on actions.
    :param actions_df: DataFrame with columns ['agent_id', 'action', 'reward', 'lifespan']
    :return: Chi-Square test results
    """
    action_reward_correlation = actions_df.groupby('action')['reward'].mean().corr(actions_df['lifespan'])
    action_contingency_table = pd.crosstab(actions_df['action'], actions_df['reward'] > actions_df['reward'].median())
    chi2_stat, p_val, _, _ = chi2_contingency(action_contingency_table)
    return {"correlation": action_reward_correlation, "chi_square": (chi2_stat, p_val)}

2. Reward Efficiency Analysis

def calculate_reward_efficiency(actions_df):
    """
    Calculate reward per action.
    :param actions_df: DataFrame with columns ['action', 'reward']
    :return: DataFrame with reward efficiency
    """
    reward_efficiency = actions_df.groupby('action')['reward'].mean() / actions_df['action'].value_counts()
    return reward_efficiency

3. Health and Resource Dynamics

from scipy.signal import correlate
from statsmodels.tsa.stattools import acf

def analyze_health_resource_dynamics(health_series, resource_series):
    """
    Cross-correlation analysis of health and resource levels.
    :param health_series: Series of health values over time
    :param resource_series: Series of resource levels over time
    :return: Cross-correlation and ACF results
    """
    cross_corr = np.correlate(health_series - health_series.mean(), resource_series - resource_series.mean(), mode='full')
    acf_results = acf(health_series, nlags=50)
    return {"cross_correlation": cross_corr, "acf": acf_results}

4. Lifespan Analysis

from lifelines import KaplanMeierFitter

def analyze_lifespan_distribution(lifespan_series):
    """
    Perform survival analysis.
    :param lifespan_series: Series of agent lifespans
    :return: Kaplan-Meier survival function
    """
    kmf = KaplanMeierFitter()
    kmf.fit(lifespan_series)
    return kmf.survival_function_

5. Population Dynamics Analysis

from statsmodels.tsa.seasonal import seasonal_decompose

def decompose_population_dynamics(population_series):
    """
    Decompose population time series into trend, seasonal, and residual components.
    :param population_series: Series of total population counts over time
    :return: Decomposition results
    """
    decomposition = seasonal_decompose(population_series, model='additive')
    return decomposition

6. Reward Inequality

def calculate_gini_coefficient(rewards):
    """
    Calculate the Gini coefficient for reward distribution.
    :param rewards: List or array of rewards
    :return: Gini coefficient
    """
    rewards = np.sort(rewards)
    n = len(rewards)
    cumulative_rewards = np.cumsum(rewards)
    gini = (2 / n) * np.sum((np.arange(1, n+1) - 0.5) * rewards) / np.sum(rewards) - (n + 1) / n
    return gini

7. Behavioral Clustering

from sklearn.cluster import KMeans
from sklearn.decomposition import PCA

def cluster_agent_behaviors(actions_df):
    """
    Cluster agent behaviors based on action frequencies.
    :param actions_df: DataFrame with columns ['agent_id', 'action']
    :return: Cluster labels
    """
    action_frequencies = actions_df.groupby(['agent_id', 'action']).size().unstack(fill_value=0)
    pca = PCA(n_components=2)
    reduced_data = pca.fit_transform(action_frequencies)
    kmeans = KMeans(n_clusters=3)
    clusters = kmeans.fit_predict(reduced_data)
    return clusters, reduced_data

8. Health vs. Age Interaction

from scipy.optimize import curve_fit

def fit_health_age_model(age_series, health_series):
    """
    Fit a nonlinear regression model to health vs. age data.
    :param age_series: Series of agent ages
    :param health_series: Series of agent health
    :return: Fitted curve parameters
    """
    def health_decay(age, a, b, c):
        return a * np.exp(-b * age) + c

    params, _ = curve_fit(health_decay, age_series, health_series)
    return params

9. Strategy Evolution

import networkx as nx

def analyze_strategy_transitions(actions_df):
    """
    Analyze Markov transition probabilities between actions.
    :param actions_df: DataFrame with columns ['agent_id', 'action', 'step']
    :return: Transition matrix
    """
    transitions = actions_df.sort_values(['agent_id', 'step']).groupby('agent_id')['action'].apply(lambda x: list(zip(x, x[1:])))
    edges = [edge for sublist in transitions for edge in sublist]
    G = nx.DiGraph()
    G.add_edges_from(edges)
    transition_matrix = nx.to_numpy_matrix(G)
    return transition_matrix

10. Cohort-Based Analysis

from scipy.stats import ttest_ind

def compare_cohorts(df, cohort_column, metric_column):
    """
    Compare performance between cohorts.
    :param df: DataFrame with cohort and metric data
    :param cohort_column: Column representing cohorts
    :param metric_column: Column representing the performance metric
    :return: t-test results
    """
    cohorts = df[cohort_column].unique()
    t_stat, p_val = ttest_ind(df[df[cohort_column] == cohorts[0]][metric_column],
                              df[df[cohort_column] == cohorts[1]][metric_column])
    return {"t_stat": t_stat, "p_val": p_val}

1. Action Type Distribution Analysis

Objective: Understand the dynamics behind the frequency and impact of each action.
Methodology:
- Perform a correlation analysis between action frequencies and agent lifespans/rewards.
- Use Chi-Square tests to evaluate if certain actions are disproportionately associated with successful agents (longer lifespans or higher rewards).

2. Reward Efficiency Analysis

Objective: Quantify the efficiency of actions in yielding rewards.
Methodology:
- Calculate reward per action (average reward divided by frequency for each action type).
- Compare the efficiency of actions between different agent groups (e.g., system vs. independent agents).

3. Health and Resource Dynamics

Objective: Identify relationships between health, resources, and agent behavior.
Methodology:
- Conduct a cross-correlation analysis to determine if fluctuations in resource availability drive changes in health levels.
- Use Fourier or wavelet analysis to check for periodicity in resource levels or health over time.
- Build a multivariate regression model to predict agent health based on age, resource levels, and recent actions.

4. Lifespan Analysis

Objective: Explore factors contributing to lifespan variability.
Methodology:
- Use survival analysis (e.g., Kaplan-Meier curves) to estimate the survival probability over time for different groups of agents.
- Perform Cox proportional hazards modeling to identify predictors of lifespan (e.g., health trends, actions taken, interactions).

5. Population Dynamics Analysis

Objective: Investigate the drivers of fluctuations in population numbers.
Methodology:
- Compute autocorrelation and cross-correlation between agent categories (e.g., system vs. independent agents).
- Use time-series decomposition to identify trends, seasonal patterns, and residual fluctuations in population dynamics.

6. Reward Inequality

Objective: Quantify inequality in rewards among agents.
Methodology:
- Calculate Gini coefficient to measure reward inequality.
- Perform Lorenz curve analysis to visualize reward distribution among agents.

7. Behavioral Clustering

Objective: Identify common behavioral strategies among agents.
Methodology:
- Apply unsupervised learning techniques (e.g., k-means clustering or DBSCAN) on agent behaviors (e.g., action frequencies, rewards, lifespans).
- Use principal component analysis (PCA) to reduce dimensionality and visualize behavior clusters.

8. Health vs. Age Interaction

Objective: Investigate the interplay between health decline and age.
Methodology:
- Fit a nonlinear regression model to health vs. age data to determine the functional form of health decline.
- Examine outliers with unusually high health for their age to identify unique strategies or behaviors.

9. Strategy Evolution

Objective: Track how agent behaviors evolve over time.
Methodology:
- Perform a Markov chain analysis to identify transition probabilities between different action types.
- Analyze the diversity of behaviors over time using metrics like Shannon entropy or Simpson diversity index.

10. Cohort-Based Analysis

Objective: Analyze performance differences among agent groups.
Methodology:
- Divide agents into cohorts based on initial conditions (e.g., initial health or resources) and compare their outcomes.
- Test for statistical significance of differences in lifespan, rewards, or health trends using ANOVA or t-tests.

Action Type Distribution Analysis

Objective

The purpose of Action Type Distribution Analysis is to investigate the relationship between the types of actions agents perform and their overall success or behavior in the simulation. It helps to uncover:

Which actions are most or least common.
Whether certain actions contribute more significantly to agent success (e.g., longer lifespans, higher rewards).
Patterns or biases in agent behavior related to the action types.

Steps Involved

Frequency Analysis
- Count the frequency of each action type to identify which actions dominate the simulation.
- Use this to understand the balance of behaviors in the simulation (e.g., gathering vs. attacking).
Correlation Analysis
- Investigate whether there is a statistical relationship between action frequencies and other metrics like lifespan or reward.
- For example, do agents that perform "gather" more frequently live longer or earn higher rewards?
Chi-Square Test
- Conduct a Chi-Square test to evaluate whether action type and agent success are independent or related.
- Example: Determine if agents that "attack" are more likely to receive higher rewards than agents that "gather."
Visualization
- Create bar charts or histograms to visualize the distribution of action types.
- Highlight any disparities or trends in the data.

Why It's Important

This analysis provides critical insights into agent behaviors:

Helps to optimize the simulation by adjusting action rewards or probabilities.
Reveals dominant or underutilized strategies, guiding potential improvements in agent design or environment rules.
Identifies correlations that could indicate emergent behavior or unintended consequences in the simulation.

Implementation Details

Here’s how you can break this into manageable steps with Python:

Calculate Frequencies

def calculate_action_frequencies(actions_df):
   """
   Calculate frequencies of each action type.
   :param actions_df: DataFrame with columns ['action']
   :return: Series with action frequencies
   """
   return actions_df['action'].value_counts()

Calculate Correlations

def calculate_action_correlations(actions_df):
   """
   Calculate correlation between action frequencies and agent success metrics.
   :param actions_df: DataFrame with columns ['agent_id', 'action', 'reward', 'lifespan']
   :return: Correlation matrix
   """
   grouped = actions_df.groupby('action').agg({'reward': 'mean', 'lifespan': 'mean'})
   return grouped.corr()

Perform Chi-Square Test

from scipy.stats import chi2_contingency

def chi_square_test(actions_df):
   """
   Perform Chi-Square test between action type and success (e.g., high reward).
   :param actions_df: DataFrame with columns ['action', 'reward']
   :return: Chi-Square test results
   """
   contingency_table = pd.crosstab(actions_df['action'], actions_df['reward'] > actions_df['reward'].median())
   chi2, p, dof, expected = chi2_contingency(contingency_table)
   return {"chi2": chi2, "p_value": p, "degrees_of_freedom": dof, "expected": expected}

Visualize Results

import matplotlib.pyplot as plt

def plot_action_distribution(actions_df):
   """
   Plot the frequency of each action type.
   :param actions_df: DataFrame with columns ['action']
   """
   action_counts = actions_df['action'].value_counts()
   action_counts.plot(kind='bar', color='skyblue', edgecolor='black')
   plt.title("Action Type Distribution")
   plt.xlabel("Action Type")
   plt.ylabel("Frequency")
   plt.show()

Insights to Derive

Dominance of Actions: Identify which actions are performed most frequently and whether this aligns with the simulation design goals.
Success and Action Types: Determine whether actions like "attack" or "share" are strongly associated with agent success.
Optimization Opportunities: Highlight underutilized or overly dominant actions that may need rebalancing in the simulation.

Next Steps

Use this analysis to understand agent behaviors and tune the simulation environment.
Combine this analysis with reward efficiency (e.g., rewards per action) to adjust reward mechanisms or action incentives.

Dooders / Experiments