GMOD / jbrowse-plugin-msaview

multiple sequence alignment browser plugin for JBrowse 2
Apache License 2.0
3 stars 0 forks source link

get first version of ABrowse plugin running in JB2 #1

Closed rbuels closed 3 years ago

rbuels commented 3 years ago

laying the keel for the new project.

plugin skeleton, get ABrowse demo data running inside JB2

cmdcolin commented 3 years ago

To speed up some things I developed this inside the jbrowse-components repo for now

https://github.com/GMOD/jbrowse-components/tree/msaview

localhost_3000__config=test_data%2Fvolvox%2Fconfig json session=local-xrhDPxFri (1)

This initial demo renders to svg and is very basic

We will need to brainstorm ideas about where virtualization will help or where it will hurt

rbuels commented 3 years ago

I'm not sure a ground-up rewrite of ABrowse is in scope for this. @ihh and I both were hoping to see the existing ABrowse code taken and developed into a JB2 plugin. Is there something preventing going in that direction?

cmdcolin commented 3 years ago

I gained experience doing MSA and tree drawing during gsoc and believe using those lessons will be beneficial.

Also I made several of software engineering recommendations for the abrowse project that were not incorporated AFAIK and so I believe it will be a difficult platform to build off of.

I think we can deliver a solid new product without much pain, and would be happy to go over plans if interested. I believe my short demo above already shows the value of a very compact and easily done system.

I don't believe the goals will stray from abrowse features also, we will incorporate all important abrowse features with this rewrite

ihh commented 3 years ago

By software engineering recommendations you mean the use of React hooks, and tweening? Would it be hard to convert the ABrowse codebase to use those patterns? Why do you think it will be difficult to build off of?

Does this code have collapsible tree nodes, or animations? What size/formats of alignments can it handle? Does it display sequence logos for ancestral nodes? Can it build a tree if none is supplied?

cmdcolin commented 3 years ago

I don't want this to be a conflict but I believe my implementation would deliver on many of those features in due time. If needed we can go with abrowse, but it'd require a lot of refactoring in my view.

My demo code only took me one day to make, so there is not a lot invested in it, but I do think it's an interesting platform that I'd be interested in continuing.

As far as concerns about abrowse, there are a number of things in the abrowse code that i just find kind of stringy and complicated. For example the tree datatstructure https://github.com/ihh/abrowse/blob/master/src/App.js#L499

My code now uses a tree datastructure from d3 to drive both the MSA and Tree drawing code and feels pretty elegant IMO whereas my experience working on abrowse was that the datastructures felt a bit more scattered and disorganized. I spent a several days in early abrowse development to try to improve some of these things but had trouble with the spaghetti-ness of the code. The overall distinction between functional component/hooks is relatively minor, it is more organizing a lot of the complexity of the data structures, the spaghetti, etc

Another data point is the app is currently at 15fps during scroll for me https://ihh.github.io/abrowse/build/index.html yet we are looking to strive for scalability, and it might be that things need a more drastic rethinking to obtain better performance with more sequence data!

As far as my code, I tested on that tiny example, I also tested a newick tree that UCSC directly provides for a COVID clade https://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/nextstrain/

The demo has newick parsing built in, it has clustal-js (https://github.com/cmdcolin/clustal-js), and I can add the stockholm-js too (which has embedded newick) and all the rest of the types

I have had lessons during the GSoC project for example with the nice d3 hierarchy driver, and I thought using that for my demo was a good proof of concept.

I think it is also extensible and can deliver on our desired features

By software engineering recommendations you mean the use of React hooks, and tweening?

Yes hooks, but also there is a feeling of spaghetti code and other data structure concerns that need refactoring.

Would it be hard to convert the ABrowse codebase to use those patterns?

I believe it would be challenging to convert it to use newer patterns potentially, as I have stated above I tried before but there were challenges. I have learned some lessons sense then and can retry though

Why do you think it will be difficult to build off of?

Spaghetti code Worry about scalability Would need a pretty good amount of refactoring to properly integrate into jbrowse ecosystem (do have to extensively refactor it into mobx-state-tree state so we can properly snapshot it, etc) No tests, so just have to eyeball the refactoring process

Does this code have collapsible tree nodes, or animations?

My code is an early demo. I believe it could have these. D3 is actually famous for great animations. However, I don't really want to prioritize animations because I think they can easily overcomplicate the code if the wrong abstraction is used. D3 is generally a great abstraction in general though, lots of famous examples here.

What size/formats of alignments can it handle?

Untested but could make some early results. Loaded some newick trees here https://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/nextstrain/nextstrain19A.nh and it was fine, and will be using canvas to make it better

Does it display sequence logos for ancestral nodes? Can it build a tree if none is supplied?

Probably would implement those aspects using code copied/adapted from abrowse

Again, I don't mean this as an attack on abrowse and if it's needed we can port abrowse in.

ihh commented 3 years ago

It's not whether it's an attack on abrowse, it's more that you are proposing some pretty significant changes (both to implementation and to the overall aims) and instead of a systematic and transparent decision process (beginning with code review and feature audit; then laying out the pros, cons, and rationale; and engaging the design team and/or leadership [me in this case] in order to agree major changes) this feels more like a kind of mix of gut feeling, instinct, and somewhat hubristic unilateral decision-making. The problem is that the process of prototyping was designed to figure out whether certain features and approaches are viable. It's not clear whether you are making full use of that.

spaghetti code - Yes, there is spaghetti code in ABrowse in order to knit together a bunch of things. Refactoring that spaghetti code may be necessary, and adding tests too. I think you see those as unpleasant tasks, but the largest (and most important) task would be to understand what's been done already, and what this taught us about the path towards the design goals. I doubt that reinventing the wheel is going to bring you to that understanding more quickly than a code review.

the tree data structure - I think you're referring to representing the tree as a list of branches, when your preference was to represent it as a hierarchically nested data structure. This strikes me as a relatively minor implementation detail about algorithms and data structures, but for what it's worth, going back and forth between recursive and iterative representations of a tree is pretty standard. If you think of a tree as a graph, then an edge-list representation is natural. For many purposes you want to straightforwardly enumerate or iterate over branches as a simple set, and it's unnecessarily cumbersome to use recursive code for this, even with JavaScript's closures etc. In the phylogenetic algorithms I've coded (over 2 decades, not to boast about the grayness of my beard) I've generally come around to the practice of representing a tree (internally) as a flat data structure, having found that the appeal of a "more elegant" hierarchical data structure is somewhat superficial. This is why I made this choice but, regardless, you have leeway to refactor this if you consider (and can make the case) that it is important. Perhaps a tad more justification than just "it feels stringy and complicated".

scalability - Absolutely, it is going to need scalability refactoring for bigger datasets, we have started talking about compression and chunked file formats. The 15fps scroll rate is a relevant measurement but would make more sense if put in context (hardware? browser? how measured?) and compared to other solutions.

I think the biggest concern for me is "I don't really want to prioritize animations because I think they can easily overcomplicate the code". Animation was a major design goal from the beginning of this! The combination of not wanting to compromise your technical vision as an engineer, and being willing to ignore design goals that are communicated to you if they conflict with that vision, is dangerous. In particular it seems like it's going to lead to the loss of most of what I figured out during prototyping. I agree it complicates the code, and I agree that animations can be hard to make performant. But that's part of what makes this app unique (and JBrowse actually).

Honestly you are not alone in wanting to drop tricky things like animations because they complicate the code. This has come up a few times in JBrowse over the years. And maybe sometimes I will let myself be talked out of them, e.g. I keep reluctantly shelving the idea of animated continuously-morphing transitions between circular and linear views (even though I did spend effort prototyping that in Rotunda too). Still, for basic philosophical reasons I think animations are important in maintaining continuity (and therefore attention) throughout transitions between fundamentally artificial visualizations that lack natural landmarks and orientations, so I am going to keep pushing them.

And once you accept animations into your heart, I promise you will find some meatballs in the spaghetti that is abrowse. For example, a lot of that complicated messy code is me trying to figure out the edge cases for how to make an animation look even halfway decent when frame rate drops. Or which things can be precalculated (this would naturally lead to an approach like tweening). The code is also peppered with comments about which are the current bottlenecks and how we might improve them.

I am not attached to the code, and am happy if you rewrite it, but it concerns me to see design features that I considered significant (and wrote into grant proposals) dropped in the name of technical convenience, discarding the work that was done to assess the feasibility of those features. I would recommend you try to spend a bit longer understanding what I was trying to do with abrowse before you push ahead with reinventing it from scratch. I am happy to talk about it.

cmdcolin commented 3 years ago

For what it's worth, I'm sorry about being brash in my approach here, I was kind of just doing my usual hacker thing and diving at the problem and not really considering others. I'm really trying to work on some of my teamwork skills consciously but definitely wasn't really thinking about this. I think I learned some good lessons from my experiment but sometimes I don't really consider the implications and insist on things too hard before they are really fleshed out, etc.

I think the plan going forward will probably try to incorporate abrowse more or less as it is and refactor from there. I'd still be happy to talk offline or something too. If interested I can try to write up a document with plans also

ihh commented 3 years ago

Great, no worries, water under bridge etc. It's partly our design process which could use improvement: we need formal process for experiments, review, discussions, etc. I will leave it to you and Rob to go forward from here, distracted as I am. My main interest here is in establishing continuity of the lessons I learned during prototyping - not necessarily the code itself (though the lessons are embedded in the code).

cmdcolin commented 3 years ago

base abrowse code running here

localhost_3000__config=test_data%2Fvolvox%2Fconfig json session=local-uVV4q1orE

raw instance here http://s3.amazonaws.com/jbrowse.org/code/jb2/msaview/index.html?config=test_data%2Fvolvox%2Fconfig.json&session=share-zLXtEhWp09&password=3PThC

still got a lot to do but it's a start

cmdcolin commented 3 years ago

ancestral reconstruction, query params, etc not there in demo. just a short port (required converting css to jss also because it produced error "Callback already was called" importing css which was confusing but might go back to https://github.com/webpack-contrib/mini-css-extract-plugin/issues/577)