Closed jbloom closed 3 months ago
Hello Jesse!
Sorry about this. I don't think we realized the extent to which this was pressing for you. Jared has just finished v1.0 of multidms and is in the process of finishing the PR for the vep pipeline. He's switched all of his effort towards this rather than working on the MS, and it should be done soon.
Awesome, thanks so much.
@jbloom Very sorry I had not realized the urgency of this, I should have provided a simple patch like this earlier. https://github.com/dms-vep/dms-vep-pipeline-3/pull/145 should fix things for now while we iron out the manuscript and more carefully consider the multidms > 1.0.0
integration.
@jgallowa07 @Haddox @matsen, I wanted to check re your plans for
multidms
as at this point the serious bug that causes the version we are using to give essentially random-number results forpandas
versions >= 2.2 has just become intolerable for us. As you may recall, this is the issue: whenpandas
>= 2.2 themultidms
results are garbage. As you may recall, this is because the oldermultidms
code makes some non-guaranteed assumptions about the ordering data frame rows that stop being true in newerpandas
versions (see here).As what was supposed to be a very short-term temporary fix in
dms-vep-pipeline-3
we pinned topandas
2.1 in the recommendedconda
environment. However, this is not an ideal solution because other people may build environments that use the latestpandas
rather than applying that pin. This showed up most recently in @timcyu's H3 deep mutational scanning, where he got incorrect results because for whatever reason the environment he was using probably had a more recent version ofpandas
. Also, pinning to an older version ofpandas
is becoming difficult to maintain with other packages we use. I'm especially worried about this because if the versions aren't exactly correct the pipeline spits out wrong results rather than just throwing some error or something.We've had an open issue for >6 months to update
multidms
in the pipeline, but it is being held up due to a bunch of subtle questions that aren't honestly very important for any analyses we are doing, especially in comparison to the current situation of the pipeline just giving wrong results on one of the core aspects if anything gets off with all the complicated version pinning inconda
.So basically, are you able to provide an estimate of when you might be able to merge a new version of
multidms
intodms-vep-pipeline-3
that has this issue fixed? I think if this isn't feasible very soon we might alternatively just want to switch to using the Kinney lab global epistasis model software (which I think I can implement) as the current situation is so dire.