JonJala / mama

MIT License
13 stars 4 forks source link

Ldscore cleaning1 #3

Closed JonJala closed 3 years ago

JonJala commented 3 years ago

Made some changes to delete some unused / unneeded code. Was just a quick pass, but in case we don't get to rewrite or do a thorough pass on this stuff, some changes are low-hanging fruit that would make things easier.

Since we don't have tests for this stuff, though, I don't really have an easy way to verify that this didn't break anything. It shouldn't have, but maybe you could try running with this on some actual data and make sure it looks the same as before?

ggoldman1 commented 3 years ago

Thanks Jon! I think one simple (ish) update we could make is using either hail or pandas-plink to load in the binary data. Let me knwo if you want to discuss that.

ggoldman1 commented 3 years ago

Trying to run some 1kG data through, everythign is hanging at the moment.

JonJala commented 3 years ago

Ah, ok, so maybe this did break something. (or maybe the server is being slow?)

As for Hail or pandas plink, I don't know how easy it would be to incorporate Hail into a Python script (unless you mean we'd call it as a separate process, in which case we could weigh that, though it would require folks to install it). I have been playing around with pandas plink a little bit, and it seems to work well in a lot of ways, though I'm still trying to figure out how the lazy loading works in practice. As a little test, I loaded in one chromosome's worth of genetic data from 1000G and calculated the per-SNP mean, and then deleted the pandas_plink object (and ultimately the numpy array with the means), and it was still holding onto a lot of memory.

If you feel like you have a good handle on pandas_plink, though, feel free to try using it in the ldscore scripts to see how it goes.

I do like the idea of swapping out a bunch of the custom code for some library calls, but I hesitate to do it without knowing really well what's going on there. Doesn't preclude some pathfinding / testing it out if you want to give that a go, though!

On Tue, Nov 3, 2020 at 3:52 PM ggoldman1 notifications@github.com wrote:

Trying to run some 1kG data through, everythign is hanging at the moment.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/JonJala/mama2/pull/3#issuecomment-721368522, or unsubscribe https://github.com/notifications/unsubscribe-auth/APIOF52NV4NAJSMUKKWPZZTSOBUQ5ANCNFSM4TJGHNBQ .

ggoldman1 commented 3 years ago

No worries it was just server traffic. Everything looks good.

JonJala commented 3 years ago

Everything looks good as in it ran and it was just the server being slow? Or looks good as in it ran and the results look fine? (if it's the latter, ill merge, but if it's the former, I'll wait)

ggoldman1 commented 3 years ago

Results match

JonJala commented 3 years ago

Great, thanks!