Currently the grouping is quite difficult to follow. There are similarly named dataframes and excessive copying, as well as somewhat difficult to follow while-loops, which makes it difficult to reason about the code.
Some changes that could be implemented:
Rename variables with easier, describing names
Limit variable amount to minimum. Especially limit excessive copying of variables (df = data.copy() and such when not absolutely necessary)
Use a for loop over credsets in credible set grouping, since there the amount of groups is predetermined. This can be combined with sorting the credible sets before the loop in the desired order, so that the processing is easy to reason about. This would also make it possible to group the variants in almost any order we want, for example by locus and with primary credible sets over secondary (SUSIE CS id 1 before 2, 2 before 3 etc), or by ascending p-value.
Improve tests, maybe with example data from an early release.
Currently the grouping is quite difficult to follow. There are similarly named dataframes and excessive copying, as well as somewhat difficult to follow while-loops, which makes it difficult to reason about the code.
Some changes that could be implemented:
df = data.copy()
and such when not absolutely necessary)