GMOD / jbrowse-components

Source code for JBrowse 2, a modern React-based genome browser
https://jbrowse.org/jb2
Apache License 2.0
202 stars 61 forks source link

update root model to support multi-assembly tracks #765

Closed cmdcolin closed 4 years ago

cmdcolin commented 4 years ago

The track model can be useful for synteny, even though it doesn't necessarily fit the conventional notion of a track, so that users can edit colors, turn on and off an overlay, etc.

Some possible todos

Make the hierarchical track selector use dataset as a source of tracks instead of an assembly name?

We discussed about how we could allow multiple datasets per assembly name, so I am trying to fit that into the picture that we have where we are selecting a assembly name in various concepts.

cmdcolin commented 4 years ago

Example

"datasets": [
    {
      "name": "grape_vs_peach_dataset",
      "tracks": [
        {
          "trackId": "mcscan_anchors",
          "type": "LinearSyntenyTrack",
          "name": "mcscan_anchors",
          "configRelationships": [
            {
              "type": "mcscanGeneTrack",
              "target": "grape_genes"
            },
            {
              "type": "mcscanGeneTrack",
              "target": "peach_genes"
            }
          ],
          "category": [
            "Annotation"
          ],
          "adapter": {
            "type": "MCScanAnchorsAdapter",
            "mcscanAnchorsLocation": { "uri": "test_data/grape.peach.anchors.simple" }
          }
        }
      ]
    },
cmdcolin commented 4 years ago

Example of the view config

      "views": [
        {
          "type": "LinearSyntenyView",
          "headerHeight": 44,
          "datasetName": "grape_vs_peach_dataset",
          "width": 1850,
          "height": 400,
          "displayName": "Grape vs Peach",
          "configuration": {
            "type": "LinearSyntenyView",
          },
          "trackSelectorType": "hierarchical",
          "views": [
            {
              "type": "LinearGenomeView",
              "offsetPx": 0,
              "bpPerPx": 1000,
              "displayedRegions": [
                {
                  "refName": "Pp01",
                  "start": 0,
                  "end": 50022430,
                  "assemblyName": "peach"
                }
              ],
              "reversed": false,
              "tracks": [
                {
                  "type": "BasicTrack",
                  "height": 100,
                  "configuration": "peach_genes",
                  "selectedRendering": ""
                }
              ],
              "controlsWidth": 120,
              "width": 800,
              "hideControls": false,
              "hideHeader": true,
              "hideCloseButton": true,
              "trackSelectorType": "hierarchical",
              "minimumBlockWidth": 20
            },
            {
              "type": "LinearGenomeView",
              "offsetPx": 0,
              "bpPerPx": 1000,
              "displayedRegions": [
                {
                  "refName": "chr5",
                  "start": 0,
                  "end": 25000000,
                  "assemblyName": "grape"
                }
              ],
              "reversed": false,
              "tracks": [
                {
                  "type": "BasicTrack",
                  "height": 100,
                  "configuration": "grape_genes",
                  "selectedRendering": ""
                }
              ],
              "controlsWidth": 120,
              "width": 800,
              "hideControls": false,
              "hideHeader": true,
              "hideCloseButton": true,
              "trackSelectorType": "hierarchical",
              "minimumBlockWidth": 20
            }
          ]
        }
      ],
cmdcolin commented 4 years ago

Actions

1) Make assemblies their own entity, outside of dataset 2) Then make datasets are just tracks and connections 3) Make dataset have an assemblyId or something of this sort, so that assemblyId does not have to be specified for all tracks 4) Some track types can explicitly say which assemblyIds they are associated with, specifically synteny

hg19 hg38

One proposed idea of restructuring the model:

Have top level assemblies Have top level tracks Tracks are associated with assemblies Tracks have metadata that can be used for filtering, metadata can be used for dataset organization

The reason for this restructuring is because the concept of "assembly" and "tracks" under a dataset was challenged by synteny, and also by the terminology of dataset being unclear

The idea of a connection is also under dataset, and it may result in connection being added directly to global tracks, or into the "notional global tracks" that combines connection and local configuration as a view

The set of tracks for an assembly is just filtered from the global set of tracks "SELECT * FROM tracks where track.assemblyName='hg19' AND track.patientName='MySpecialPatient"-esque queries

Connection tracks will either (a) not be persisted in config or (b) manually refreshed by user from remote

User workflow

"I want to open a linear genome view" Ok here is a list of assemblies to choose from "I want to open my BAM file now" Ok, good we know your assembly that you chose, proceed with opening a bam file

Alternate workflow:

Open BAM file first, or have a linear genome view that has no assembly configured, and then selecting a track selects the assembly also. May not be necessary to prioritize this workflow

Random offtopic stuff for tracks:

We may need to allow track types that can render on different types of views, so a VCF can display in linear and circular (propose after PAG)

cmdcolin commented 4 years ago

^^ above is a brainstorm from meeting

main conclusions might be

1) tracks are a global array 2) assemblies are a global array

cmdcolin commented 4 years ago

I made an initial attempt at switching up the base config here, https://github.com/GMOD/jbrowse-components/tree/create_new_config_model

Not all there but you can see the general approach was, move jbrowseModel, rootModel, etc, and then make them factory functions that can receive params from the jbrowse-web

cmdcolin commented 4 years ago

Some of the aspects of the above attempt are maybe interesting but do present challenges. Trying to make everything core results in things about "main menu" being in core for example for the root model, etc, the consequences of which doesn't really pop out until you do a couple of other things. It would be good to have a core model that can be extended, but for now, probably gotta abort the above attempt unless that can be resolved. Other attempt at making createTestSession not invoke jbrowse-web and instead be more modular, which kind of revealed the main menu issues https://github.com/GMOD/jbrowse-components/tree/refactor_create_test_session

garrettjstevens commented 4 years ago

Open BAM file first, or have a linear genome view that has no assembly configured, and then selecting a track selects the assembly also

I did a very rough draft that implements this behavior, in the track_restructuring branch

garrettjstevens commented 4 years ago

Done in #801