covid19ABM / comma

An agent-based microsimulation model to study mental health outcomes during covid-19 lockdowns
https://covid19abm.github.io/comma/
Apache License 2.0
2 stars 1 forks source link

Create a mock example of learning BN structure from data #47

Closed jiqicn closed 11 months ago

jiqicn commented 1 year ago

Hi @n400peanuts and @Astrid-p, I created this thread for tracking purposes.

The plan is to write a function that accepts a data frame (rows represent agents and columns are the features) as input and output a learned Bayesian Network model by saving it into a file.

Astrid-p commented 1 year ago

Hi Ji,

As I understand, this function takes in input from a data frame, trains a Bayesian model through this input data, then later returns the fine-tuned Bayesian Network, is it correct?

Astrid

jiqicn commented 1 year ago

@Astrid-p, thanks for asking! I would say that the function works in a different way: the network structure without dependencies (i.e. the undirected graph here) should be defined in prior; on top of that, the function learns the dependencies (the direction of edges in the graph) from a dataset.

I will try to come up with a notebook today and show how it works with an example.


UPDATE:

Actually, you're right! After searching on the Internet, I also found another Python package called bnlearn that can do exactly what you said. That will save time in computing cross-tabs for us!

Here I get the notebook to show how to learn the structure and parameters of BN from data, with a well-known dataset. As you can see, the usage of this package is really straightforward. The only decision made there is about which searching algorithm to use for structure learning. To briefly explain: for learning BN structure from data, what the package does is first find out all the possible structures, then find out the one that yields the highest score compared to the others. The score here is basically the maximum likelihood of the network looking in a specific way given the dataset.

@n400peanuts @Astrid-p, I guess it would be nice to discuss how to make what I have in the notebook work with the lifeline dataset. Shall we schedule a meeting for that?

n400peanuts commented 1 year ago

@jiqicn I wonder if it isn't better to hear from @Astrid-p first? @Astrid-p do you think you can follow the same approach on the data? I think the approach you showed @jiqicn is nice, but before meeting I think is more productive to have at least a first try on the data -- this way we know already what are the problems and perhaps if Astrid runs it we could look at the results.

Astrid-p commented 1 year ago

Good morning, Ji, Eva,

@jiqicn, thank you for your messages! This seems to be a very efficient solution in compared to the TabularCPD that I am working on 🤯. Just want to clarify, we don't need to use our pre-built Bayesian network for this approach, do we?

@n400peanuts It is a nice suggestion. I could try to run this through Lifelines data and fit them on different structure learning algorithms to see what the DAGs look like. However, I have a big concern, as I know from Kristina, the Lifeline database system is offline so I am not sure if I can download and run packages in their workspace. (@kristinathompson am I correct about this?) Nevertheless, I will try to run this when Kristina returns to the office on Wednesday.

n400peanuts commented 1 year ago

@Astrid-p great! Let us know how it goes

jiqicn commented 1 year ago

Hi both @n400peanuts @Astrid-p, I'm sorry to get back to this thread so late, as I'm recently quite busy with my other work. But I saw we kind of made some agreements there, that is nice!

And regarding your question Astrid: yes, you're right, the bnlearn package indeed doesn't require a pre-defined BN structure and everything is learned from data.

n400peanuts commented 11 months ago

Hi, I am closing this issue as the BN is not used any more.