covid19ABM / comma

An agent-based microsimulation model to study mental health outcomes during covid-19 lockdowns
https://covid19abm.github.io/comma/
Apache License 2.0
2 stars 1 forks source link

Initialize the Bayesian Network(s) given the selected features. #39

Closed jiqicn closed 1 year ago

jiqicn commented 1 year ago

After discussing with the LA, we got agreed that "living with child" this feature should be left out.

jiqicn commented 1 year ago

Hypotheses given by the LA:

Common assumptions:

Not sure:

jiqicn commented 1 year ago

Proposed Bayesian Network:

flowchart LR

Gender --> Age
Gender --> Employment
Gender --> Partner
Gender --> Depression
Gender --> Burnout

Age --> Education
Age --> Employment
Age --> Partner
Age --> Depression
Age --> Burnout
Age --> Addiction

Education --> Employment
Education --> Partner
Education --> Depression
Education --> Burnout
Education --> Addiction

Employment --> Depression
Employment --> Burnout
Employment --> Addiction
Employment --> Fatigue
Employment --> Partner

Partner --> Depression
Partner --> Partner-difficulties

SES --> Finance-difficulties
SES --> Housing-difficulties
SES --> Depression
SES --> Burnout
SES --> Addiction

Job-type --> Employment

Area --> SES
Area --> Depression
flowchart LR
Parent --> Single-Parent
jiqicn commented 1 year ago

Hi @n400peanuts, I come up with the above Bayesian Network structure based on the given hypotheses and some common assumptions. We can discuss what to add/remove from this network when you have time.

n400peanuts commented 1 year ago

Hi Ji, I think this makes sense more or less, though I am not sure of this interpretation:

Feelings of loneliness increased on average for all respondents and in particular for those who live alone or have a disadvantaged socioeconomic position (Depression | SES, Partner, Parent)

Specifically for the Parent variable -- not sure being a single parent means that you live alone or you come from a disadvantaged socioeconomic position...But in general I think it makes more sense to ask Kristina about this, she knows her data best.

Also, I am not sure why you would draw the dependencies from the variables from her hypotheses -- these are hypotheses of the effects of the lockdown on the individuals' characteristics, why would you want to draw the proportion of people belonging to each class based on those?

jiqicn commented 1 year ago

Hi @n400peanuts,

Specifically for the Parent variable -- not sure being a single parent means that you live alone or you come from a disadvantaged socioeconomic position...But in general I think it makes more sense to ask Kristina about this, she knows her data best.

Agree and we will definitely ask Kristina about how she feels. And you're right, I just realize that parenthood may not add a lot to the assumption of living alone or not, so I will remove it.

Also, I am not sure why you would draw the dependencies from the variables from her hypotheses

I found lockdown to be the premise of those hypotheses that can be about many things, and I've collected those that seem to indicate some correlations between individual features. Besides those hypotheses, I have no other information about the dataset that can be used for building the network, or we do? If you feel that the network should be built on some other information, I'm also quite okay with it and please let me know.

jiqicn commented 1 year ago

Hi @n400peanuts, since Kristina sent us a new cross-tab form that eliminates the number of features to 10, the network structure should also be updated accordingly. I will find a time and make a new network this week if time allows.

n400peanuts commented 1 year ago

@jiqicn sounds good. I was wondering: is it possible to have gender and age independent of each other? And both contributing to the next variable (e.g., education)? I am not sure if we should have age depending on gender or vice versa, but if the implementation in python then becomes too convoluted, I also understand and it is fine to keep it as it is.

jiqicn commented 1 year ago

@n400peanuts, no, it's not difficult to implement it at all, but may I ask what is the reason you want to do that? Given the cross-tab result by Kristina, it seems that we should reject the null hypothesis.

n400peanuts commented 1 year ago

@jiqicn OK, I haven't checked the stats for the age/gender relationship, because I gave for granted that there was no relation between the two variables (why age should have been related to gender). I see now that we have more women than man and both are also distributed not equally across ages...so yeah no, it can't be done. When we are done with the bayesian network, I was wondering if it wouldn't be best to write a 1 page explanation of the rationale of the dependencies of the variables + the final schema -- I think it would be helpful for when we have to wrap up a poster or a paper. What do you think?

jiqicn commented 1 year ago

@n400peanuts I agree that in general there's no need to have age and gender depend on each other, but I guess the idea here is to fit our model to Kristina's data as much as possible to make sure our sample dataset looks similar enough to that (please correct me if I'm wrong). Regarding writing a summarization of that work, I think you're right and let's do that after we finalize the network structure design.

jiqicn commented 1 year ago

Network structure that includes all variable associations given by independence tests, but have no direction decided.

graph TD
G[Gender] --- A
G --- P
G --- S[Self-rated health]
G --- CJ[Critical Job]

A[Age] --- E
A --- P
A --- C
A --- H
A --- S
A --- CJ

E[Education] --- P
E --- D
E --- C
E --- H
E --- S
E --- CJ

U[Umemployed] --- P
U --- C
U --- H
U --- S

P[Partner] --- D
P --- C
P --- H
P --- S

D[Depressed] --- S

C[Children] --- H

H[Housing/financial trouble] --- S

Two directions of learning the structure after this:

n400peanuts commented 1 year ago

@jiqicn let's have a chat about this with Kristina on monday -- I think the way you built it is sensible, and we could think of implementing this as it is based on the cross tabs Kristina has provided us, and later on working on learning the network structure from the data. I would have this latter option as a "nice to have" -- what do you think? Maybe something that @Astrid-p as well could help on (if she wants of course)

Astrid-p commented 1 year ago

Hi, Ji @jiqicn !

This is the updated Bayesian model with added direction for each connection, I also have it reviewed by Kristina. There are a few notes that I want to highlight:

Kind regards, --Astrid--

COMMA Bayesian network 15June

jiqicn commented 1 year ago

Hi @Astrid-p, thanks for the updates! Regarding your questions:

  • You will notice some bi-directional links that were highlighted in yellow. I have checked out some articles that built the Bayesian model of bi-directional connections. But in your opinion, is it too complicated to put into consideration in terms of the used ABM packages? If it is not possible, they can be adjusted into one-directional links as well.

I have no idea tbh how bi-directional connections work in Bayesian models, and thus no clue what is the gain by adding that. Also, all Python packages I know for building such Bayesian models are based on the assumption that models follow the causal structure of DAGs, which never allows a bidirected edge. So I would suggest that we start with unidirectional models.

  • Is it possible to remove some linkage? I didn't grasp how the team developed the previous Bayesian version in the last meeting. I understand all the previous links were decided based on the actual data from Kristina (the data shows the correlation between age and gender), is it correct?

You can do pruning based on your knowledge, i.e. expert knowledge. And if the previous version you mean is this, you are right, it's learned from the real data, more explicitly, the cross-tab results given by Kristina a couple of weeks ago.

And this is how I did that: Kristina had the chi-square independence test on pairwise features done in cross-tabs. From there, with the p-value of a feature pair to be less than or equal to a threshold (0.05 in general), we can decide to reject the null hypothesis of that pair, i.e. to say they are dependent on each other. In this way, I can figure out all the dependencies and build the network.

Astrid-p commented 1 year ago

Hi @jiqicn ,

Thank you for the clarification! I had the same thought about the bi-directional links, but still wanted to double-check with you if my assumption is correct. In this case, this is the final update of the BN COMMA Bayesian network 19Jun

n400peanuts commented 1 year ago

@jiqicn Can we proceed in implementing this structure while @Astrid-p tries in parallel to learn the structure from the data?

Astrid-p commented 1 year ago

@jiqicn @n400peanuts, I notice that bnlearn package also provides a function to fit data on the customized structures. We can build our own structure through make_DAG and fit the data and structure through parameter_learning. Is this what you are looking for to run this BN as well?

jiqicn commented 1 year ago

@n400peanuts, I think the idea is exactly like what @Astrid-p said above. I also thought about computing those CPDs by hand before realizing the existence of the parameter_learningfunction, but I guess we don't have to do that now.