FINNGEN / autoreporting

MIT License
0 stars 1 forks source link

DAO for reading credible set data #190

Closed Lipastomies closed 3 years ago

Lipastomies commented 3 years ago

Currently reading CS data is done with some functions that were written as soon as data types changed. This is fine, but there is a lot of sketchy pandas code that keeps on breaking and is a real pain to maintain. Also, the outputs of those functions are pandas dataframes, and have no definition to look up- It's rather difficult to work on that code.

This PR introduces a datatype for credible sets (and variants in credible sets), as well as file readers that read either summarized susie output or full bgzipped outputs. The datatypes are easy to look up, and this IMO makes using the code and testing it much easier.

This PR also fixes #185 that was due to making changes to reading of credible set data, and not updating those changes in a different branch. Now reading CS data is decoupled from grouping, which should make keeping up with changes much easier. CS data is still represented as dataframes in the grouping code - it's quite messy and I didn't want to break it this time, but at least now there is a boundary to the mess.

Lipastomies commented 3 years ago

TODO: add negative tests, so ones that check that things fail when tjhings shjould fail.