FINNGEN / autoreporting

MIT License
0 stars 1 forks source link

Refactor grouping functions in gws_fetch #116

Closed Lipastomies closed 1 year ago

Lipastomies commented 4 years ago

Currently the grouping is quite difficult to follow. There are similarly named dataframes and excessive copying, as well as somewhat difficult to follow while-loops, which makes it difficult to reason about the code.

Some changes that could be implemented:

  1. Rename variables with easier, describing names
  2. Limit variable amount to minimum. Especially limit excessive copying of variables (df = data.copy() and such when not absolutely necessary)
  3. Use a for loop over credsets in credible set grouping, since there the amount of groups is predetermined. This can be combined with sorting the credible sets before the loop in the desired order, so that the processing is easy to reason about. This would also make it possible to group the variants in almost any order we want, for example by locus and with primary credible sets over secondary (SUSIE CS id 1 before 2, 2 before 3 etc), or by ascending p-value.
  4. Improve tests, maybe with example data from an early release.