epiverse-trace / epichains

Methods for simulating and analysing the sizes and lengths of infectious disease transmission chains from branching process models
https://epiverse-trace.github.io/epichains/
Other
6 stars 2 forks source link

Cyclomatic complexity of `simulate_chains()` #179

Closed jamesmbaazam closed 9 months ago

jamesmbaazam commented 9 months ago

In #171, the new simulate_chains() function tripped the cyclocomp_linter with a cyclomatic complexity of 21 against the expected 15.

This issue is to discuss ways to reduce the function's complexity. More broadly, it raises the general question of whether that level of complexity is always achievable for simulation functions as they often require various levels of control flows and complex logic.

sbfnk commented 9 months ago

it raises the general question of whether that level of complexity is always achievable for simulation functions as they often require various levels of control flows and complex logic.

I think that's a good point. We struggled with this one before and have ended up more or less where we started. That said there are things we could consider, e.g.:

jamesmbaazam commented 9 months ago

Thanks, Seb, I've implemented these ideas in #171 and reduced it by 1 unit. I'll keep this issue open to consolidate more ideas.

sbfnk commented 9 months ago

Thanks, Seb, I've implemented these ideas in #171 and reduced it by 1 unit.

Well that was worth it then 🫠

Probably not a bad idea to think of more ideas here but also, is there something magical about the number 15 for cyclomatic complexity? I worry about falling subject to Goodhart's law, or more specifically making the code worse at the expense of hitting an arbitrary target.

Tagging @Bisaloo as it's potentially a question of broader interest.

jamesmbaazam commented 9 months ago

Totally agree and that's why I set up the issue. I probably should have set it up as a discussion on the Epiverse-TRACE org. I do see the point of reducing unnecessary branching in smaller functions but I don't see how it's always possible in larger simulation functions.

Bisaloo commented 9 months ago

Yes, I agree we don't want to fall in the trap described in the Goodhart's law.

The metrics used in the project should not be considered as targets but as a temperature check. I see this as the same tool as "normal ranges" in health checkups. If a value is outside the range, it's good that a physician takes a deeper look. But it doesn't necessarily always mean something is wrong.

This is the same here. 15 is a value that empirically myself and others have found to correlate well with whether a function is easily understandable. If the cyclomatic complexity exceeds 15, it is important to take time to consider if the function can be refactored in a different way to make it easier to understand. But that may just not be possible in some cases. In these cases, once a couple of pairs of eyes have looked at potential improvements, we can make an exception.

The same goes for code coverage and other metrics we use in the project BTW. As a first approach, we try to stick to what usually works well (full code coverage and cyclomatic complexity < 15). But we know that there are cases where these metrics don't capture well what we are trying to achieve and we should never try to artificially fulfill a perceived needed value.

If it wasn't clear until now, suggestions on how to make it clearer either in documentation (blueprints) or in how the checks are presented are welcome!