bradbell / at_cascade

Cascading Dismod_at Analysis From Parent To Child Regions
https://at-cascade.readthedocs.io
4 stars 3 forks source link

db2csv command fails because avgint table moves after sample table creation error #12

Closed garland-culbreth closed 7 months ago

garland-culbreth commented 8 months ago

Impacted versions

dismod_at version: dismod_at-20231229 at_cascade version: 2023.12.22

Description of issue

When at_cascade.csv.fit() errors and fails during sample table creation, the avgint table is still moved to c_shift_avgint after the error occurs. This causes subsequent db2csv commands to fail because that command expects a non-empty avgint table.

Screenshot of a log table from a node where this occurred:

Screenshot 2024-01-29 at 15 49 16

This might be preventable by changing the table moving behavior to leave avgint in place if an error occurs during sample table creation.

bradbell commented 8 months ago

When you report an issue, it would help if you create a simple and fast to execute test that demonstrates the problem. That way, once it it fixed, we can run the test to make sure it is working in the future. This is called regression testing.

Tests are easier to make than examples because they do not connect to the documentation. Here is an example test that uses db2csv: https://github.com/bradbell/at_cascade/blob/master/test/no_ode_ignore.py

garland-culbreth commented 7 months ago

Closing this issue because the cause doesn't seem to be the error mentioned here. I reproduced that error and was able to run db2csv successfully.

The code I used to create the reproducible example is:

import numpy as np
import pandas as pd

fit_dir = "."
data_integrands = ["Sincidence", "mtexcess", "remission"]
parent_rate_true = {"Sincidence": 0.1, "mtexcess": 0.01, "remission": 0.01}
canada_random_effect = 0.5
age_grid = np.linspace(start=0, stop=100, num=3)
time_grid = np.linspace(start=2000, stop=2020, num=3)

# Create CSV interface files ==========================================

# option_fit ----------------------------------------------------------
df_option_fit = pd.DataFrame([
    {"name": "max_fit", "value": 250},
    {"name": "sample_method", "value": "asymptotic"},
    {"name": "max_num_iter_fixed", "value": 250},
    {"name": "random_seed", "value": 0},
    {"name": "root_node_name", "value": "north_america"},
    {"name": "refit_split", "value": "false"},
    {"name": "max_abs_effect", "value": 4},
    {"name": "minimum_meas_cv", "value": 0.2},
    {"name": "ode_step_size", "value": 5},
    {"name": "quasi_fixed", "value": "true"},
    {"name": "child_prior_std_factor", "value": 2},
    {"name": "ode_method", "value": "iota_pos_rho_pos"},
    {
        "name": "age_avg_split",
        "value": "0.01 0.047945205 0.288356164 0.75 1.5 3.5"
    },
    {"name": "compress_interval", "value": "5 5"},
    {"name": "shared_memory_prefix", "value": "2024-01-30-at-regression-test"}
])
df_option_fit.to_csv(f"{fit_dir}/option_fit.csv", index=False)

# option_predict ------------------------------------------------------
df_option_predict = pd.DataFrame([
    {"name": "db2csv", "value": "false"},
    {"name": "plot", "value": "false"}
])
df_option_predict.to_csv(f"{fit_dir}/option_predict.csv", index=False)

# node ----------------------------------------------------------------
df_node = pd.DataFrame([
  {'node_name': 'north_america', 'parent_name': ''},
  {'node_name': 'canada', 'parent_name': 'north_america'},
  {'node_name': 'usa', 'parent_name': 'north_america'}
])
df_node.to_csv(f"{fit_dir}/node.csv", index=False)

# fit_goal ------------------------------------------------------------
df_fit_goal = pd.DataFrame([
  {"node_name": "north_america"},
  {"node_name": "canada"},
  {"node_name": "usa"}
])
df_fit_goal.to_csv(f"{fit_dir}/fit_goal.csv", index=False)

# covariate -----------------------------------------------------------
covariate_table = []
for node in df_node["node_name"]:
    for sex in ["female", "male"]:
        for a in age_grid:
            for t in time_grid:
                row = {
                    "node_name": node,
                    "age": a,
                    "time": t,
                    "sex": sex,
                    "omega": 0.5,
                    "rand": np.random.lognormal(sigma=1)
                }
                covariate_table.append(row)
df_covariate = pd.DataFrame(covariate_table)
df_covariate.to_csv(f"{fit_dir}/covariate.csv", index=False)

# mulcov --------------------------------------------------------------
df_mulcov = pd.DataFrame([
    {
        "covariate": "rand",
        "type": "rate_value",
        "effected": "iota",
        "value_prior": "",
        "const_value": 1
    },
])
df_mulcov.to_csv(f"{fit_dir}/mulcov.csv", index=False)

# data_in -------------------------------------------------------------
data_table = []
for i in ["Sincidence"]:
    for a in age_grid:
        for t in time_grid:
            for node in ["canada"]:
                for sex in ["female", "male"]:
                    val = parent_rate_true[i] * np.random.lognormal(sigma=1)
                    row = {
                        'integrand_name': i,
                        'density_name': 'gaussian',
                        'node_name': node,
                        'sex': sex,
                        'age_lower': a,
                        'age_upper': a,
                        'time_lower': t,
                        'time_upper': t,
                        'meas_value': val,
                        'meas_std': 1e-1 * val,
                        'eta': '',
                        'nu': '',
                        'sample_size': '',
                        'hold_out': 0,
                    }
                    data_table.append(row)
df_data_in = pd.DataFrame(data_table)
df_data_in['data_id'] = df_data_in.index
df_data_in.to_csv(f"{fit_dir}/data_in.csv", index=False)

# child_rate ----------------------------------------------------------
df_child_rate = pd.DataFrame([
    {"rate_name": "iota", "value_prior": "prior_iota_child"}
])
df_child_rate.to_csv(f"{fit_dir}/child_rate.csv", index=False)

# parent_rate ---------------------------------------------------------
parent_table = []
for r in ["iota", "rho", "chi"]:
    for a in age_grid:
        for t in time_grid:
            row = {
                "rate_name": r,
                "age": a,
                "time": t,
                "value_prior": f"prior_{r}_parent",
                "dage_prior": "prior_parent_diff",
                "dtime_prior": "prior_parent_diff",
                "const_value": ""
            }
            parent_table.append(row)
df_parent_rate = pd.DataFrame(parent_table)
df_parent_rate.to_csv(f"{fit_dir}/parent_rate.csv", index=False)

# prior ---------------------------------------------------------------
df_prior = pd.DataFrame([
    {
        'name': 'prior_iota_parent',
        'density': 'gaussian',
        'lower': 1e-5,
        'upper': 1e+4,
        'mean': 100,
        'std': 0.01,
    }, {
        'name': 'prior_iota_child',
        'density': 'gaussian',
        'mean': 0.0,
        'std': 0.01,
    }, {
        'name': 'prior_parent_diff',
        'density': 'log_gaussian',
        'mean': 0.0,
        'std': 0.00001,
        'eta': 1e-5
    }, {
        'name': 'prior_child_diff',
        'density': 'gaussian',
        'mean': 0.0,
        'std': 0.001,
    }, {
        'name': 'prior_rho_parent',
        'density': 'gaussian',
        'lower': 1e-5,
        'upper': 1e+4,
        'mean': 10,
        'std': 0.001,
    }, {
        'name': 'prior_rho_child',
        'density': 'gaussian',
        'mean': 0.0,
        'std': 0.1,
    }, {
        'name': 'prior_chi_parent',
        'density': 'gaussian',
        'lower': 1e-5,
        'upper': 1e+4,
        'mean': 10,
        'std': 0.01,
    }, {
        'name': 'prior_chi_child',
        'density': 'gaussian',
        'mean': 0.0,
        'std': 0.1,
    }
])
df_prior.to_csv(f"{fit_dir}/prior.csv", index=False)

# predict_integrand ---------------------------------------------------
df_predict_integrand = pd.DataFrame(
  {"integrand_name": ["Sincidence"]}
)
df_predict_integrand.to_csv(f"{fit_dir}/predict_integrand.csv", index=False)

# Run CSV fit =========================================================
import at_cascade
at_cascade.csv.fit(f"{fit_dir}")