dms-vep / dms-vep-pipeline-3

Pipeline for analyzing deep mutational scanning (DMS) of viral entry proteins (VEPs)
Other
1 stars 0 forks source link

we need to either update `multidms` immediately to fix serious bug, or stop using it #144

Closed jbloom closed 3 months ago

jbloom commented 3 months ago

@jgallowa07 @Haddox @matsen, I wanted to check re your plans for multidms as at this point the serious bug that causes the version we are using to give essentially random-number results for pandas versions >= 2.2 has just become intolerable for us. As you may recall, this is the issue: when pandas >= 2.2 the multidms results are garbage. As you may recall, this is because the older multidms code makes some non-guaranteed assumptions about the ordering data frame rows that stop being true in newer pandas versions (see here).

As what was supposed to be a very short-term temporary fix in dms-vep-pipeline-3 we pinned to pandas 2.1 in the recommended conda environment. However, this is not an ideal solution because other people may build environments that use the latest pandas rather than applying that pin. This showed up most recently in @timcyu's H3 deep mutational scanning, where he got incorrect results because for whatever reason the environment he was using probably had a more recent version of pandas. Also, pinning to an older version of pandas is becoming difficult to maintain with other packages we use. I'm especially worried about this because if the versions aren't exactly correct the pipeline spits out wrong results rather than just throwing some error or something.

We've had an open issue for >6 months to update multidms in the pipeline, but it is being held up due to a bunch of subtle questions that aren't honestly very important for any analyses we are doing, especially in comparison to the current situation of the pipeline just giving wrong results on one of the core aspects if anything gets off with all the complicated version pinning in conda.

So basically, are you able to provide an estimate of when you might be able to merge a new version of multidms into dms-vep-pipeline-3 that has this issue fixed? I think if this isn't feasible very soon we might alternatively just want to switch to using the Kinney lab global epistasis model software (which I think I can implement) as the current situation is so dire.

matsen commented 3 months ago

Hello Jesse!

Sorry about this. I don't think we realized the extent to which this was pressing for you. Jared has just finished v1.0 of multidms and is in the process of finishing the PR for the vep pipeline. He's switched all of his effort towards this rather than working on the MS, and it should be done soon.

jbloom commented 3 months ago

Awesome, thanks so much.

jgallowa07 commented 3 months ago

@jbloom Very sorry I had not realized the urgency of this, I should have provided a simple patch like this earlier. https://github.com/dms-vep/dms-vep-pipeline-3/pull/145 should fix things for now while we iron out the manuscript and more carefully consider the multidms > 1.0.0 integration.