cmdcolin / mafviewer

A JBrowse plugin to view multiple alignment format (MAF) files
26 stars 4 forks source link

Usage questions #18

Closed hexylena closed 5 years ago

hexylena commented 5 years ago

I'm working with small orgs so, not quite confident on how things should be working? Maybe you can provide some advice @cmdcolin

I've got an e coli genome (+ some relatives) https://usegalaxy.eu/u/helena-rasche/h/maf-test, specifically dataset 24, and I'm trying to get this setup in jbrowse locally.

I've run:

./plugins/MAFViewer/bin/maf2bed.pl < ~/Downloads/Galaxy24-\[LASTZ_on_data_5_and_data_23__mapped_reads\].maf 2>/dev/null | sort -k1,1 -k2,2n | bgzip > out.txt.gz
tabix -p bed out.txt.gz

mv out* data/

My tracklist looks like:

$ cat data/trackList.json | jq
{
  "formatVersion": 1,
  "tracks": [
    {
      "category": "Reference sequence",
      "chunkSize": 20000,
      "key": "Reference sequence",
      "label": "DNA",
      "seqType": "dna",
      "storeClass": "JBrowse/Store/Sequence/StaticChunked",
      "type": "SequenceTrack",
      "urlTemplate": "seq/{refseq_dirpath}/{refseq}-"
    },
    {
      "label": "MAF",
      "urlTemplate": "out.txt.gz",
      "storeClass": "MAFViewer/Store/SeqFeature/MAFTabix",
      "type": "MAFViewer/View/Track/MAF",
      "samples": [
        "CP001855.1",
        "CP001856.1",
        "CP003289.1",
        "CP003290.1",
        "CP003291.1",
        "CU928164.2",
        "U00096.3"
      ]
    }
  ]
}

(samples determined by grepping through my MAF file, is it not possible to determine those at runtime?)

And I'm not seeing anything. When I include "Ecoli_C" in the samples list, I see

image

but only for that one genome. Is there something I'm missing here?

cmdcolin commented 5 years ago

The existence of the dot in the sample name might mess it up:)

hexylena commented 5 years ago

image

looks like you were right, ugh, ok, going to have to do some aggressive post-processing to ensure compliant naming. Thanks!

hexylena commented 5 years ago

Next issue:

out2

cmdcolin commented 5 years ago

Can you clarify your setup

hexylena commented 5 years ago

Ok, I was serving this locally, deploying to an nginx server in order to share with you and it works perfectly there, I'm guessing a byte range issue again :(

hexylena commented 5 years ago

https://proxy.internal.galaxyproject.eu/stash/jbrowse/?loc=Ecoli_C%3A1995201..2251000&tracks=DNA%2CMAF&highlight= thanks for the help :)

hexylena commented 5 years ago

Argh, so sorry for the noise, it was rendering fine (which initially confused me into closing) but exhibits the same "mousing over results in weird moving of blocks"

This occurs in current firefox + chrome on linux for me

Edit

Edit 2

This seems to be a minimal example:

a score=8422
s CP0240901 1551524 122 + 4592887 CGTAGGCCGGATAAGGCGTTCA---CGCTGCATCCGGCAC-------CCGGAGCCTGATGCGACGCTGGCGCGTCTTATCAGGCCTACAAA-------CCGAGCCGTAGGCCGGATAAGGCGTTTACGCCGCATCCGGC
s Ecoli_C     965232 139 - 4576293 CGTAGGCCGGATAAGGCGTTCATTACGCCGCATCCGGCATTTGTGCGCTGATGCCTGATGCGACGCTGACGCGTCTTATCATGCCTACAAATCTGTACCCGAACCGTAGGCCGAATAATGCGTTTACGCCGCATCCGAC

a score=8422
s LT9064741 1550663 122 + 4625968 CGTAGGCCGGATAAGGCGTTCA---CGCTGCATCCGGCAC-------CCGGAGCCTGATGCGACGCTGGCGCGTCTTATCAGGCCTACAAA-------CCGAGCCGTAGGCCGGATAAGGCGTTTACGCCGCATCCGGC
s Ecoli_C     965232 139 - 4576293 CGTAGGCCGGATAAGGCGTTCATTACGCCGCATCCGGCATTTGTGCGCTGATGCCTGATGCGACGCTGACGCGTCTTATCATGCCTACAAATCTGTACCCGAACCGTAGGCCGAATAATGCGTTTACGCCGCATCCGAC

a score=8422
s Ecoli_C 1550662 122 + 4576293 CGTAGGCCGGATAAGGCGTTCA---CGCTGCATCCGGCAC-------CCGGAGCCTGATGCGACGCTGGCGCGTCTTATCAGGCCTACAAA-------CCGAGCCGTAGGCCGGATAAGGCGTTTACGCCGCATCCGGC
s Ecoli_C  965232 139 - 4576293 CGTAGGCCGGATAAGGCGTTCATTACGCCGCATCCGGCATTTGTGCGCTGATGCCTGATGCGACGCTGACGCGTCTTATCATGCCTACAAATCTGTACCCGAACCGTAGGCCGAATAATGCGTTTACGCCGCATCCGAC
{
  "formatVersion": 1,
  "tracks": [
    {
      "chunkSizeLimit": "2800000",
      "label": "MAF",
      "samples": [
        "CP0205431",
        "CP0240901",
        "Ecoli_C",
        "LT9064741"
      ],
      "storeClass": "MAFViewer/Store/SeqFeature/MAFTabix",
      "type": "MAFViewer/View/Track/MAF",
      "urlTemplate": "mwe.txt.gz"
    }
  ]
}
$ ./plugins/MAFViewer/bin/maf2bed.pl < mwe.maf 2>/dev/null | sort -k1,1 -k2,2n | bgzip > data/mwe.txt.gz
$ tabix -p bed data/mwe.txt.gz

It is also displaying for the incorrect genomes:

image

Edit 4

Trying to minimize the example further, noticed that the perl script needs a final print, it's returning 1 fewer region than I have in my maf file.

$ cat mwe.maf
a score=8422
s CP0240901 1551524 122 + 4592887 CGTAGGCCGGATAAGGCGTTCA---CGCTGCATCCGGCAC-------CCGGAGCCTGATGCGACGCTGGCGCGTCTTATCAGGCCTACAAA-------CCGAGCCGTAGGCCGGATAAGGCGTTTACGCCGCATCCGGC
s Ecoli_C     965232 139 - 4576293 CGTAGGCCGGATAAGGCGTTCATTACGCCGCATCCGGCATTTGTGCGCTGATGCCTGATGCGACGCTGACGCGTCTTATCATGCCTACAAATCTGTACCCGAACCGTAGGCCGAATAATGCGTTTACGCCGCATCCGAC

a score=8422
s LT9064741 1550663 122 + 4625968 CGTAGGCCGGATAAGGCGTTCA---CGCTGCATCCGGCAC-------CCGGAGCCTGATGCGACGCTGGCGCGTCTTATCAGGCCTACAAA-------CCGAGCCGTAGGCCGGATAAGGCGTTTACGCCGCATCCGGC
s Ecoli_C     965232 139 - 4576293 CGTAGGCCGGATAAGGCGTTCATTACGCCGCATCCGGCATTTGTGCGCTGATGCCTGATGCGACGCTGACGCGTCTTATCATGCCTACAAATCTGTACCCGAACCGTAGGCCGAATAATGCGTTTACGCCGCATCCGAC
$ ./plugins/MAFViewer/bin/maf2bed.pl < mwe.maf Ecoli_C
Ecoli_C 965232  965371  Ecoli_C_1       1       CP0240901:1551524:122:+:4592887:CGTAGGCCGGATAAGGCGTTCA---CGCTGCATCCGGCAC-------CCGGAGCCTGATGCGACGCTGGCGCGTCTTATCAGGCCTACAAA-------CCGAGCCGTAGGCCGGATAAGGCGTTTACGCCGCATCCGGC,Ecoli_C:965232:139:-:4576293:CGTAGGCCGGATAAGGCGTTCATTACGCCGCATCCGGCATTTGTGCGCTGATGCCTGATGCGACGCTGACGCGTCTTATCATGCCTACAAATCTGTACCCGAACCGTAGGCCGAATAATGCGTTTACGCCGCATCCGAC
cmdcolin commented 5 years ago

This is an issue related to https://github.com/cmdcolin/mafviewer/issues/7 I think, I recommend making sure there are non-overlapping chunks

I dunno if there's a better way, open to opinions

hexylena commented 5 years ago

So does the mwe.maf above count as overlapping? I experience the error with only that 7 line maf file converted/indexed. It only has two mapped regions, to two different genomes. Yes they're in the same place but diff targets.

cmdcolin commented 5 years ago

If you load the feature as a plain CanvasFeatures track it should have no overlapping elements for this plugin. MAF files from UCSC were what I originally designed it for which have this property

cmdcolin commented 5 years ago

Here is how it looks in the test dataset for c.elegans ?data=plugins/MAFViewer/test/data. You can see that all the canvas features are not overlapping, yet it still contains all the info for each species

localhost_jbrowse__data=plugins%2FMAFViewer%2Ftest%2Fdata loc=chrI%3A6994627 6995575 tracks=MAF%2CMAF%20CF highlight=

This is how it looks in your dataset

localhost_8000_

What it expects is that each "block" contains all the info for all the species, not separate species in different overlapping blocks. I think if you wanted separate species in different overlapping blocks you might make them different tracks entirely, or this plugin would require a fair amount of re-coding