Projection Demonstration pieces

nathandunn commented 8 years ago

TODOS (again):

[x] re-branch (client-side port?), jbrowse and Apollo
[x] finish porting MultisequenceProjection to shared
[x] remove back-end projection sequences
[x] pull scaffold sequence (versus reference track) from the UI and deprecate (but do NOT remove)
- [ ] in Browser.js constructor , Util.parseLocString( ) . . to instantiate a "projection" object not have a 'ref' ?

loc = {"ref":"Group2.19","start":225566,"end":391970} 
refSeq = {"seqChunkSize":20000,"start":0,"name":"Group2.19","length":3883383,"end":3883383}

needs to be:

{"sequenceList":[{"name":"Group11.4"}],"label":"Group11.4"}:-1..-1 = PROJECTION:minCoord..maxCoord
loc = {"projection":{"sequenceList":[{"name":"Group11.4"}],"label":"Group11.4"}} ,"start":225566,"end":391970} 
loc = {"ref":PROJECTION,"projection":"PROJECTION",start, end}
refSeq = {"seqChunkSize":20000,"start":0,"name":"PROJECTION","projection":PROJECTION,"length":3883383,"end":3883383}

[ ] stub in a "working-ish" solution as hopefully above
[ ] create tests in spec to match
[ ] in Browser.js, loadRefSeqs and addRefSeqs
- [ ] bookmark notation selection work (and store cookie) (e.g., refSeq.projection)
- [ ] pull JBrowse sequence drop-down in favor of proper decoration
- [ ] add feature bookmark #670
- [ ] project that in the client
- [ ] project multi-scaffolds on the front-end:
- [ ] single scaffold
- [ ] sequence
- [ ] tracks
- [ ] multiple scaffolds
- [ ] sequence
- [ ] tracks
- [ ] multiple for individual feature
- [ ] sequence
- [ ] tracks
[ ] folding
- [ ] integrate with this (or like) the reverse compliment plugin based on some of the other plugins
  Exploratory Tests
[ ] update Store / SeqFeature / NClist to project contiguous features
- [x] we can load appropriate contiguous sequence (??)
- [x] we can load appropriate contiguous features (query / start / end?) - (Store/NClist) . . . looks to be duplicative of current work
- [x] works on appropriate range (e.g., a very narrow feature range?)
- [x] we can load appropriate BAM's (query / start / end?): under _getFeatures
- [x] works within a gene / limited range
[x] update Store / SeqFeature / NClist to project minus interior components:
- [x] test doing an annotation (is it using the same coordinate space?)
- [x] what does track look like (can it be further decorated?)
- [x] can we still use the highlighting code in blockbased?
[x] update Store / SeqFeature / NClist to do folding
Use-cases
scrunch 1 intron between exons
- [ ] change "fold" link to "Annotation Genometry" . . shows exons + details DiscontinuousProjection:List, Coordinate:min,max,sequence/org
- [ ] provide region annotation in that area in the "folded" region
scrunch all introns in one gene
- [ ] same as above
view contiguous scaffolds (same as above use MultisequenceProjection):
- [ ] add multisequence projection
- [ ] add very specific tracks
- [ ] update the trackScale to reflect changes
- [ ] update multiplehighlight to handle difference
view contiguous scaffolds of single genes
- [ ] same as above, but have very specific coordinates region

Datastructure: Putative datastructure used for display. Note that the "folding" is different as it indicates the areas we are hiding (and need to indicate). Features indicate the areas that we are showing.

      {
            "sequenceList":[
                {
                    "name":"chr1",
                    "features":[
                        {"name":"SOX9","fmin":123,"fmax":789}
                    ],
                    "folding":[
                        {"fmin":123,"fmax":789}
                    ]
                }
            ]
        }

TODO

[x] refork master
- [x] add bookmark stuff back (https://github.com/nathandunn/Apollo/pull/13 preferred) or pick out the merge (https://github.com/nathandunn/Apollo/pull/12)
- [ ] disable server-side projection (have to grab contiguous sets from model, though)
[ ] move server library into the client? (test_contig_labels), with node (related to #755) using TypeScript:
- [x] extend and add floorKey, ceilingKey, etc, to the sortedMap via TreeMap
- [x] wire in TypeScript projection code
- [ ] code discontinuous-projection into client
- [ ] test folding, features, etc.
- [ ] code multisequence-projection into client
- [ ] use server-side factory to generate map + display and hand back into RefSeq
- [ ] move to its own github repos
- [ ] add build environment (node)
- [ ] add test environment
[ ] track display:
- [ ] display tracks properly
- [ ] on annotation, calculate the reverse or cache the original
- [ ] top-level track renumbering / detail
- [ ] bottom-level track renumbering / detail
- [ ] proper border display (almost there, replace with SVG and a much thinner / transparent border)
- [ ] proper "this" display (almost there, fix the refseq.name in the browser, should fix 100% display as well)
[ ] annotate across scaffolds ( #51 )
[ ] test top/bottom-level track renumbering this is just to test that the methods can "project" properly, in practice this is 100 Ajax / WS calls, but it should fix any logic???
- [ ] add server-side lookup for these methods (project, this, that, etc.)
- [ ] top-level
- [ ] bottom-level

nathandunn commented 8 years ago

A number of problems. Contig is drawn at the correct spot, but the projection is incorrect (offset is wrong). It is likely ignoring the buffer between the two.

screen shot 2015-12-23 at 1 43 38 pm

nathandunn commented 8 years ago

Note was that things in JBrowse are stored at 0-base . . .exclusive only? But rendered as 1-base.

Externally, they are delivered as left to right, excluding the "fmin" and including the fmax.

nathandunn commented 8 years ago

Some other browsers that use GitHub (d3 based): https://github.com/googlegenomics/api-client-python https://github.com/damiankao/seeker https://gabrowse.appspot.com/#=&readsetId=CPHG3MzoCRCslZr864ik70U&backend=GOOGLE&location=4%3A95577138 http://chmille4.github.io/Scribl/ (dead, but has D3 / SVG components) https://github.com/TGAC/TGACBrowser

SVG/D3 (can include): https://github.com/mbostock/d3/wiki/SVG-Shapes

nathandunn commented 8 years ago

Look at using TypeScript for everything . . . mirroring / moving to the top.

nathandunn commented 8 years ago

Issue to link and track issues together necessary for demo.

[x] folding
[x] contiguous projection (need to fix lengths) - #51
[x] contiguous projection and folding (including padding - recheck) - #51
[ ] annotation of folded things (recheck) - #671
[ ] annotation across contigs (recheck) - #671
[ ] annotation across folded contigs (recheck) - #671
[x] show sequence when projected #680
[x] show sequence when contiguous #680
[x] show sequence when contiguous projected #680
[x] show things other than honeybee - #676
[ ] prepare screen cast:
- [x] indicate borders between contigs #673
- [x] !!select and combine scaffolds
- [x] fix unprojected scaffold placing (contig border is correct), the second contig is not the right place (works fine when projected with 0 border)
- [ ] show annotations crossing scaffold before and after (#671)

nathandunn commented 8 years ago

1- Transform Coordinates: 1a- connect contigs/scaffolds 1b- intron accordions/folds 1c- reverse complement 1d- order and orientation (empowered by reverse complement)

nathandunn commented 8 years ago

[x] evaluate modelA -> viewA -> viewB
[x] evaluate viewB -> modelA (reverse-lookup, do we have to back-calculate this?)
The problem here is that the view, which can easily changed (in DraggableHTMLFeatures renderSubFeatures), has to be reverted before it goes back into the server ) . . this is not hard to do, but we are already doing it on the server side, so there is not a ton of optimization to be had there.

            var track = this ; 
// . . . . 
            var uid = this.getId(subfeat);
            var seqLookup = track.refSeq.name ;  
            var converter = getConverter(seqLookup);
            subtype = subfeat.get('type');
            // for an annotated feature subfeat is an object
            if(subfeat.data && subfeat.data.start){
                var thisSequence = subfeat.aFeature.location.sequence  // this sequence data
//                subfeat.data.start  = subfeat.data.start - calculateConversion() ;
                  converter.calculateConversion(subfeat.data);
            }
            else
            // for an annotated feature subfeat is an array
            if(subfeat){    
                var thisSequence = subfeat[5];
                subfeat[1] = subfeat[1]-calculateConversion() ;
                  converter.calculateConversion(subfeat);
            }
///

nathandunn commented 8 years ago

When evaluation track locations, the 2 classes to consider are "overview_loc_track" and "static_track" (id and class), which are both LocationScaleTrack which is here:

http://jbrowse.org/api/View_Track_LocationScale.js.html

This is very similar to GridLines, both are absolute positioned and based BlockBased.

nathandunn commented 8 years ago

Evaluate track visualizations

[x] overview
[x] low-level

Using LocationScale, the labelNumber can be passed in with the refseq.name and we should be able to get the correct data in the correct place:

        var labelNumber = this.chooseLabel( args );

If we have the type of projection.

nathandunn commented 8 years ago

grails 3 + geb, but too much of a pain for the moment.

(https://docs.gradle.org/current/userguide/ant.html

nathandunn commented 8 years ago

One of the problems I think we have is that when we pull stuff into the view, concatenated or otherwise, it may affect what gets pulled, as well.

cmdcolin commented 8 years ago

It might be good if we could have a meeting about this. I have at least a couple ideas that might be of interest for the client side

nathandunn commented 8 years ago

Can u put them in here and then get started on the variant visualization piece?

We might have some time to talk on Thursday. I have solutions to each of the problems so far, but further thoughts are always welcome.

Nathan

On Feb 9, 2016, at 12:49 PM, Colin Diesh notifications@github.com wrote:

It might be good if we could have a meeting about this. I have at least a couple ideas that might be of interest for the client side

— Reply to this email directly or view it on GitHub.

cmdcolin commented 8 years ago

Basically I just have a nice programming strategy that I've used successfully on some jbrowse plugins which involves making custom "store classes". I think the strategy is pretty flexible and I wanted to see if it would apply here, but I didn't want to just jump into the issue thread here. I thought a meeting would help debrief it

nathandunn commented 8 years ago

Interesting .. Can you outline an example or go into more detail? I won't have time to discuss until Thursday. But I think that this a good place to discuss those ideas.

Nathan

On Feb 9, 2016, at 1:29 PM, Colin Diesh notifications@github.com wrote:

Basically I just have a nice programming strategy that I've used successfully on some jbrowse plugins which involves making custom "store classes". I think the strategy is pretty flexible and I wanted to see if it would apply here, I didn't want to just jump into the issue thread here. I thought a meeting would help debrief it

— Reply to this email directly or view it on GitHub.

cmdcolin commented 8 years ago

I have several examples

1) the sashimiplot plugin that I developed at the hackathon is an example of this "Strategy" that I mentioned. It has it's own custom "Store class" that calculates the number of reads that span a spliced alignment. However, it simply lives "on top of" the normal BAM store class that alignment tracks use. My plugin uses the default BAM store, but then it processes the output to calculate the splicing coverage

2) Another example of the strategy was made with this gccontent plugin (wanted to mention this one at last lab meeting) but it uses similar strategy. The plugin itself implements a custom "storeclass" but it simply fetches data from the default sequence store, and preprocesses the sequence data before returning it.

3) There is even a third one that I call multibigwig, which can combine multiple bigwig files into one track. Similar strategy with a custom storeclass.

So in essence, those examples brought home to me that having a custom store class that does some preprocessing can be a very flexible.

Now why is it relevant? Well, perhaps we can more easily process data from multiple scaffolds. We can just have a multiple-scaffold-store-class that fetches that data from different sequences. Then the server-side doesn't have to preprocess the JSON data, it can be done on client side

That is just my basic idea. I think the variation viewer was talking about even changing the view to adjust for haplotypes, so maybe there is something from this concept that can be related to the variant viewer too (just speculating)

nathandunn commented 8 years ago

If you have links to the plugin source, that would be great.

What I was thinking . . . because we have to process the track data, etc., is to copy the engine into the client and then operate on the view (doing shortly) from the existing data sources.

So . . I was going to keep it the same on the store side and modify the views using the projection client I am porting, including the track list (top and bottom) and the source store.

The hard part is that when modifying the view, it becomes the “model” and you have to reverse it . . though I’m already doing this on the server side, so I’ve written this anyway.

So . . . if you are requesting locations N-M on sequence B, this gets projected to into N’-M’ on B (e.g., exon folding). This is all keyed of the refSeq.name

If you are requesting location N-(M)-P on sequences B:C, this gets projected as N’-P’ on sequences B:C. So long as the datastore is bringing back the N-P version (and we transform the N’P’), we should be fine.

Nathan

On Feb 9, 2016, at 3:01 PM, Colin Diesh notifications@github.com wrote:

I have several examples

1) the sashimiplot plugin that I developed at the hackathon is an example of this "Strategy" that I mentioned. It has it's own custom "Store class" that calculates the number of reads that span a spliced alignment. However, it simply lives "on top of" the normal BAM store class that alignment tracks use. My plugin uses the default BAM store, but then it processes the output to calculate the splicing coverage

2) Another example of the strategy was made when with this gccontent plugin (wanted to mention this one at last lab meeting) but it uses similar strategy. The plugin itself implements a custom "storeclass" but it simply fetches data from the default sequence store, and preprocesses the sequence data before returning it.

So in essence, those two sort of examples sort of brought home to me that having a custom store class that does some preprocessing can be a very flexible.

Now why is it relevant? Well, perhaps we can more easily process data from multiple scaffolds. We can just have a multiple-scaffold-store-class that fetches that data from different sequences. Then the server-side doesn't have to preprocess the JSON data, it can be done on client side

That is just my basic idea. I think the variation viewer was talking about even changing the view to adjust for haplotypes, so maybe there is something from this concept that can be related to the variant viewer too (just speculating)

— Reply to this email directly or view it on GitHub https://github.com/GMOD/Apollo/issues/715#issuecomment-182124558.

cmdcolin commented 8 years ago

During the transformation, perhaps the features could be augmented with their original coordinates, and then perhaps no "reverse projection" would be required.

In any case, these are the plugins i mentioned https://github.com/cmdcolin/sashimiplot https://github.com/cmdcolin/multibigwig https://github.com/cmdcolin/gccontent

nathandunn commented 8 years ago

I was thinking about the storing / caching, as well, but I forgot to write it down. Its there (as you would expect for a linear transform), but I agree that it is unnecessary.

nathandunn commented 8 years ago

Talked to @monicacecilia and it sounds like driving this from the annotator panel will be good as a first-step. If we assign folding from the genome browser it adds "other" problems in terms of losing context, etc.

monicacecilia commented 8 years ago

@nathandunn Yes, I think it is a good idea. Let's make sure @selewis is on board with driving from annotator panel.

nathandunn commented 8 years ago

Notes to self about probably methodology. In order to fulfill the specified use-cases, we have 3 basic pieces to fulfill.

intron-folding (cases 1 and 2)
projection of a small genomic area (case 4)
projection of contiguous scaffolds / sequences (cases 3 and 4)

To support these use-cases we need:

Indicate folding (cases 1 and 2 on the front-end)
Update track coordinates (all cases)
Label regions (cases 3 and 4)
Annotate by calculating reverse-projections (all cases)

So, in terms of methodology to support all of this, the current MultisequenceProjection library handles ALL cases correctly, though I need to handle "folding" correctly in terms of adding additional "Intervals". This will allow us to annotate it on the front-end and add intervals to handle projection on the back-end. A very tractable problem.

When we consider the case of the reverse-complement, from the UI perspective it is just a "projection" on a scaffold added to every one of the use-cases and can be an added a label / filter to reverse. Since the conversion in coordinates and codons are both linear and reversible, it is very tractible.

Proposed methodologies:

backend mangling / front-end decoration: Largely what we have now, except that we are folding introns over a reference scaffold instead of the annotations. I initially thought that this might be bad because of all of the work re-projecting on the backend, but I'm not sure this is too bad. What's left to do on this:
fix "display" of things:
- these could pick up the projection from the initial RefSeq (including name) in order to decorate the view
fix reverse-projection during annotation display
fix multiple bugs (as always)
1. backend stores bookmarks and drives-UI / front-end decorates and mangles using the same projection code: This works VERY well for folding and projecting a single transcript. Viewing multiple scaffolds is a bit more problematic as we need to have both track and sequence information for all scaffolds in the correct position before it can be mangled.
What I would like to do is mangle together multiple sequence/loc combinations into one. e.g. Gene1 = Scaffold1::fmin1..fmax1, Gene2==Scafold2::fmin2..fmax2 . . and then combine the scaffold. There are two ways that I know of doing this:
1. do this on the backend by effectively doing the same thing and stitching the results together and then reprojecting once we had them. However, we are setting the frame and letting the javscript do this, so this would be VERY hard.
2. add a specific store for EACH track.
  1. DraggableHTMLFeatures would over-ride fillFeatures()
  2. over-ride BAM features retrieval
  3. over-ride getFeatures is SequenceChunked in SequenceStore
  4. probably the same for SequenceStore
    1. retrieve additional scaffolds using jbrowse, but mangle them on the front-end. The down-side of this method is that we retrieve extra sequence, but I don't think it will be too great. If this is ALWAYS done around features, then we can very safely use feature names, though it would probably more flexible to use coordinates (or both initially). e.g., if we are at A:10::20,B:50::60 then it would retrieve both of these at the proper location (i.e., still using the chunk code, etc.) and coordinate transforms, etc. would be done at the front-end, correcting for the chunk requested versus received. The down-side to this is that the we now have two projection schemas. However, so do most of these other methods, so I'm not sure if there is a way around this.

Probably the most likely schema is to perpetuate the backend, decorating the front-end using both the highlighter and the TrackList code (maybe returning in RefSeq?) as we go and merging in the folding code to complete the cycle. Not idea, but without cleanly rewriting the JBrowse layer, it just doesn't make much sense to do otherwise.

cmdcolin commented 8 years ago

Hi Nathan

Sorry to butt in here, but i think this is probably the right idea, but I wanted to comment in this specifically.

However, we are setting the frame and letting the javscript do this, so this would be VERY hard.

I think this isn't dramatic as you think, and is in fact ideal way to do it. I'm ironing over details because you mention a couple things incorrectly (as mentioned previously, it's much more efficient to override the store class, not the draggablehtmlfeatures part of the code. this was a realization i had at the hackathon where instead of modifying the part that calls getfeatures, you actually modify the getfeatures fucntions inside a subclassed store directly). That is essentially what my previous comments in this thread explain. I'd be happy to elaborate on this whenever, i have some demo code too :)

nathandunn commented 8 years ago

Please elaborate / provide details + code if u have it. This was more of a thought dump used for planning and discussion.

My intention in these comments was to expand the getFeatures command in the store class either in HTMLFeatures or DraggableHTMLfeautues to pull multiple sequences from the correct range. I'm less sure how that works when we get into sequences however.

Nathan

On Feb 17, 2016, at 5:27 PM, Colin Diesh notifications@github.com wrote:

Hi Nathan

Sorry to butt in here, but i think this is probably the right idea, but I wanted to comment in this specifically.

However, we are setting the frame and letting the javscript do this, so this would be VERY hard.

I think this isn't dramatic as you think, and is in fact ideal way to do it. I'm ironing over details because you mention a couple things incorrectly (as mentioned previously, it's much more efficient to override the store class, not the draggablehtmlfeatures part of the code. this was a realization i had at the hackathon where instead of modifying the part that calls getfeatures, you actually modify the getfeatures fucntions inside a subclassed store directly). That is essentially what my previous comments in this thread explain. I'd be happy to elaborate on this whenever, i have some demo code too :)

— Reply to this email directly or view it on GitHub.

cmdcolin commented 8 years ago

The HTMLFeatures and DraggableHTMLFeatures are the track types, so my point is you can do it at the store class level i.e. NCList.js, or BAM.js or Sequence.js

You can even just subclass those things, create say, ProjectionSequence.js, or ProjectionNCList.js, and modify the getFeatures there.

In any case, I went ahead and did direct modifications of those files and implemented reverse complementing on NCList and Sequence here https://github.com/GMOD/jbrowse/compare/master...cmdcolin:rev_comp?expand=1

This shows that reverse complement can be toggled

I believe that this technique is pretty extensible and can even be "spun off" into it's own modules. I also think that by adding the original feature location to feature metadata that "inverting" the projection is not necessary, and the annotator functions could access the alternative "origin feature location" metadata

It's still not a solved problem, but in any case, I think we are kind of on same wavelength. my "ideas" here are not fundamentally different from those you outlined

cmdcolin commented 8 years ago

And, you will note that I chose reverse complement and not "folding" operations. The "folding" operations might just add extra operations to the math, but I still think that this store class is the ideal "place" to do those operations as it is a good abstraction. Plus, reverse complement is cool too :)

nathandunn commented 8 years ago

Thanks. This is helpful. Its a bit more obvious what to do now for fetching contiguous sets of things in sequences. My plan would be then to use the mangled refSeq.name (unless I can the data in the refSeq itself) to pull out the necessary chunks. For BAM, I think that is pretty obvious. NClist is going to be interesting as it largely mirrors what I've already done on the server-side, but otherwise fairly tractable.

WRT to folding vs complement vs contiguous . . . . there are ways to do this WRT to layers

I was thinking that folding can still be done in the view fairly easily (for all track types) if need be. If I have the context for the current view and store (in the RefSeq.name, itself) I can reverse the coordinates during annotation. However, for the contiguous sequences, I think that this will make the most sense.

I'll try exploring this a bit more. If I can accurately pull contiguous regions next each other for a given projection, then knock on wood, everything else should be down-hill.

monicacecilia commented 8 years ago

@cmdcolin I am assuming that you intended to show the same gene represented on the forward strand and then later on the reverse strand, after the 'Reverse Complement' option is turned on.

This is a heads up that, given these examples, it seems something in your code must be erroneously translating the reading frames, incorrectly creating one- and four-nucleotide amino acid residues, and introducing frameshifts. I see tetrads, and single nucleotides for codons, instead of the only valid option - triplets.

This means that the amino acid sequence for the gene depicted in the evidence track in the first figure would be read from frame -1 and it would be translated as MRCGLDGTAHH.... Note he tetrads at the single-nucleotide "amino acids" in the upper left corner of your screen capture. This a detail of your image 1:

And in the second figure, the coding sequence would be read from +1, and the string of amino acids at the beginning would be instead MRCGLDGTAHE.... Other frames are also affected.

monicacecilia commented 8 years ago

@nathandunn -- Also, let's make sure we review details in Aim 2 (Visual Exploration) on Apollo Grant Research Strategy, as they appear under 'Visual Genome Folding'.

nathandunn commented 8 years ago

@monicacecilia sounds good.

I think, from reading @cmdcolin code that this is a quick hack to demonstrate use of the store. I see some notes about calculating proper shifts on the chromosome. Either way, these are essential details to get right.

cmdcolin commented 8 years ago

@monicacecilia thanks for catching that. I think that this was due to using a setting called view.maxPxPerBp=50. I think it created a weird "zoom level" and got confused perhaps. When I turn it off then this particular bug goes away. I could test it with the apollo sequence track too

nathandunn commented 8 years ago

It would be worth trying it with the apollo sequence track to make sure no other issues come up. Thanks.

nathandunn commented 8 years ago

If it is quick to test it with Apollo that is.

I'll have to test the store either way for contigs and folding in nclist.

Nathan

On Feb 18, 2016, at 6:44 AM, Colin Diesh notifications@github.com wrote:

@monicacecilia thanks for catching that. I think that this was due to using a setting called view.maxPxPerBp=50. I think it created a weird "zoom level" and got confused perhaps. When I turn it off then this particular bug goes away. I could test it with the apollo sequence track too

— Reply to this email directly or view it on GitHub.

cmdcolin commented 8 years ago

@nathandunn It seems to work ok in apollo sequence track too! there is one caveat that relates to it sometimes using cached data sometimes but it might be fixable.

screenshot-localhost 8080 2016-02-18 10-03-35 2 png resize

MAASLSNNNDGTPVNKEAALSNTDLS 
||||||||||||||||||||||||||
MAASLSNNNDGTPVNKEAALSNTDLS

nathandunn commented 8 years ago

Excellent, thanks.

Nathan

On Feb 18, 2016, at 8:22 AM, Colin Diesh notifications@github.com wrote:

@nathandunn https://github.com/nathandunn It seems to work ok in apollo sequence track too! there is one caveat that relates to it sometimes using cached data sometimes but it might be fixable.

https://cloud.githubusercontent.com/assets/6511937/13149987/4c919afa-d629-11e5-88fa-ab8cbfebbcb1.png https://cloud.githubusercontent.com/assets/6511937/13149988/4c924ba8-d629-11e5-8564-b85039be1c2e.png MAASLSNNNDGTPVNKEAALSNTDLS |||||||||||||||||||||||||| MAASLSNNNDGTPVNKEAALSNTDLS — Reply to this email directly or view it on GitHub https://github.com/GMOD/Apollo/issues/715#issuecomment-185800343.

cmdcolin commented 8 years ago

I sort of extracted the core functionality for that reverse projection and put it in a plugin just to show that it doesn't necessarily need to involve modifying the jbrowse codebase directly (can be a plugin)

https://github.com/cmdcolin/projectionplugin

nathandunn commented 8 years ago

Awesome. That will be preferable.

I’m going to run a few more tests to see that I can get the things I need working, working. Once I feel I have a path and things are working, I will definitely move to a plugin model.

Thanks.

On Feb 18, 2016, at 1:13 PM, Colin Diesh notifications@github.com wrote:

I sort of extracted the core functionality for that reverse projection and put it in a plugin just to show that it doesn't necessarily need to involve modifying the jbrowse codebase directly (can be a plugin)

https://github.com/cmdcolin/projectionplugin https://github.com/cmdcolin/projectionplugin — Reply to this email directly or view it on GitHub https://github.com/GMOD/Apollo/issues/715#issuecomment-185924505.

nathandunn commented 8 years ago

Testing the annotations with a projected offset yields the same results as before. It uses the displayed coordinates . . . not the model coordinates, so we still have the calculate the reverse projection on annotation (not hard) and of-course, similarly project all tracks. For folding (as we did in the view), this is fine.

So, the next 2 problems are:

can I constrain a view to a single gene region (alter the refSeq)?
can I view multiple contiguous?

nathandunn commented 8 years ago

WRT to refSeq's . . we typically pull back ALL of the refSeqs with the given lengths. Obviously if we are doing contiguous and/or constrained. Even with a "folded" sequence, we would need adjust the refSeq length, as well. These could be projected on the fly (like everything else) and intercepted upstream. While we could go into Browser.js and over-ride (grabbing all), I don't think that there is much point since we already have those loaded in the database and I think there is little value in having those on the client for the drop-down.

However, what is returned would have to be fixed:

{"{\"padding\":0, \"projection\":\"None\", \"referenceTrack\":[], \"sequenceList\":[{\"name\":\"Group15.19\"}], \"label\":\"Group15.19\"}:-1..-1":{"seqChunkSize":20000,"start":0,"name":"{\"padding\":0, \"projection\":\"None\", \"referenceTrack\":[], \"sequenceList\":[{\"name\":\"Group15.19\"}], \"label\":\"Group15.19\"}:-1..-1","length":3997324,"end":3997324}}

to

{"Group1.1":{"seqChunkSize":20000,"start":0,"name":"Group1.1","length":1382403,"end":1382403}}

So, remove the escaping. This would then encapsulate all of the folding etc. if we had already stored it there. I think it would be best to store these in the client, since I don't think that there is much value in storing those types of preferences beyond the session. The down-side of this, is that we are now re-projecting stuff on the server. There might be a reason to do this in both places, but am not sure. For now, its probably sufficient to filter and then recalculate the length in Browser.js.

The real goal in all of this is to not only visualize the projection but annotate it on the track list (LocationScale.js). Since we'll have the proper refSeq either way, we'll know where those coordinates fall within that context of a multisequence projection. This is probably the biggest piece that needs to be done in the client, as it doesn't really work outside that context. The second goal is that within the "folded" regions we can see what is what. To that end, I think using the BlockBased highlight strategy works pretty well. We want a vertical line denoting the collapsed regions on either side.

So, the big piece now is if we can call contiguous sequences for BAM, sequence, track.

cmdcolin commented 8 years ago

I guess I follow most of that. Also I fixed the 'caching' issue that I mentioned above so it should work now for general cases

bug type: file under "use immutable data structures"...i'll just leave it at that

nathandunn commented 8 years ago

Thanks. Did you add it to the same branch (revcomp) or to master?

cmdcolin commented 8 years ago

@nathandunn this is the plugin https://github.com/cmdcolin/projectionplugin

the revcomp branch is unnecessary after using the plugin

nathandunn commented 8 years ago

Got it . . . . I'll try to move all of it into plugins branch once I've verified that I can get it to work.

nathandunn commented 8 years ago

I mean .. move it into a plugin OFF of the branch. I may have to extend the branch quite a bit to make sure it works first, but we definitely want to have it in a plugin. However, not sure if some of this will stay in Apollo or not. Hopefully we can separate them.

nathandunn commented 8 years ago

Looking into how to gather the contiguous elements .. . NCList::getDataRoot(refName)

refName is the sequence name, but really what we want here is the full value for each sequenceList:

name = { 
  sequenceList:[
    { name: 'ChrI', fmin: 400, fmax: 450 } 
   ,{ name: 'Chr2', fmin: 153, fmax:  185 } 
]

getDataRoot should be able to handle this properly by pulling in the proper track info and then mangling it.

cmdcolin commented 8 years ago

If you want to have a meeting sometime to strategize let me know. I can try and explain how the projectionplugin does the transformations too if you're interested. Also I am just getting my mind blown because there are so many words for this concept...

I have heard us use the terms elision folding projection scrunching collapsing

and now, when I checked out the grant and some code from IGB

...slicing! https://wiki.transvar.org/display/igbman/Sliced+View

nathandunn commented 8 years ago

I think you're projection plugin makes perfect sense and is very readable. Folding between introns falls right in with that and should be easy to do.

The difficulty I'm running into right now is grabbing genes from multiple scaffolds to display in the same scaffold. JBrowse has a very single-scaffold view of the world. What I have done is co-opted the refSeq.name. I can easily mangle the refSeq's based on this. Grabbing the track data can be done via getDataRoot, which can be manipulated to bring back the proper data chunks and then merged, and then projected again in getFeatures (offset + any folding + any reverse complement + ? ). This can all be done based on the projection created from the refSeq.name.

Let's plan to meet on Monday and discuss. I'll add a putative time and we can change it if need be.

nathandunn commented 8 years ago

getDataRoot

SubList iteration happes in Store/NCList iterHelper / iterate.

In getRootData we would need to:

read off a list of URL's generated from the refSeq list data
instead of .then and / or use a set of DeferredList, such that we have `deferredList.then()``
in the .then function we need to merge results and then project. Merging means running through and calculating an offset based on the scaffold and projecting on the fly . . which we have to do anyway. This can be done in the featureCallBack in _getFeatures. Merge / projection should take into account (based on previously ported code):
1. original coordinates
2. original sequence
3. projection

nathandunn commented 8 years ago

Looking through again . . converting refSeq to work with projection of contiguous sequences touches every part of the code. Whether its a plugin or not, the main key value is refSeq.name . . . and every piece of code in JBrowse assumes a single-scaffold. Would estimate 2-3 months of work to get everything working plus another month of cleanup. Individual stores and tracks will be relatively easy, but a large number of changes are necessary in GenomeView, BlockBased, Browser, Location, Util, etc. Additionally the code base needs to account for multiple projections similar to the IGV (B?) code-base (thinking intron folding + contiguous view of scaffolds + reverse complement) and account from something more similar to refSeq.projection where projection is a descriptive (and potentially expansive) javscript object.

In the interim, moving back to server list.

GMOD / Apollo

Projection Demonstration pieces #715

TODOS (again):

Exploratory Tests

Use-cases