InstituteforDiseaseModeling / malaria-model_validation

1 stars 2 forks source link

Modular structure #53

Closed YeChen-IDM closed 1 year ago

YeChen-IDM commented 1 year ago

42

1. Change in simulation_coordinator.csv

Added a subset column to organize individual sites to different subsets. image

2. Change in load_input() and related code

Updated to load the subset information and pass the information to downstream steps.

3. Change in snakemake execution

Updated snakemake rules to be able to run site under certain subset(s). The default value for subset is "All" which will run all available sites. User can override at run time as: snakemake --config s='core_relationship, infection_duration' -j Note that the subset name is non-case-sensitive.

4. Change in plotting steps

The run_generate_validation_comparisons_site.py script now takes an argument --subset or -s for one or multiple subsets For core_relationship subset, the following plotting functions are called:

generate_age_incidence_outputs()
generate_age_prevalence_outputs()
generate_parasite_density_outputs()
generate_infectiousness_outputs()

For infection_duration subset, the following plotting function is callded: generate_age_infection_duration_outputs()

5. Change in readme.md

6. Change in reporting:

In progress: the plan is to add one more level for subset in the document. left is old report structure, right is new structure: image

I also updated how we define the document content. I created a class Section to replace the nested dictionary structure we used to have:

class Section:
    def __init__(self, pdf: PDF, section_title: str, section_number: int = 1, content: dict = None, level: int = 0,
                 subsection: list = None):
        """
        Define a section object for each section in the report
        Args:
            pdf (PDF):                      A PDF object
            section_title (str):            Title of the current section
            section_number (int):           Number of current section, which starts from 1 at the beginning of the
                                            document. If the current section is a sub-section of another section,
                                            the section number starts from 1 at the sub-section level.
            content (dict):                 A key-value pairs dictionary. The keys are the subtitles and values are
                                            lists that looks like this:
                                            [section_text: str, image_list: list, table_name: str]
            level (int):                    Level of current section in the document outline. range from 0 - 2 while
                                            0 is the top-level.
            subsection (list[Section]):     A list of section objects if this section contains lower level section(s).
        """

I think the new structure is easier to read compare to the old way since we now have one more level(the subset level).

I also updated the code to work with this new Section class.

7. Change in output folders for analyzer result files?:

Should we clean up the output folder when we rerun the download steps?

8. Other changes?: