GMOD / jbrowse

JBrowse 1, a full-featured genome browser built with JavaScript and HTML5. For JBrowse 2, see https://github.com/GMOD/jbrowse-components.
http://jbrowse.org
Other
460 stars 199 forks source link

Question: Reference IDs vs Names #1535

Closed srobb1 closed 3 years ago

srobb1 commented 3 years ago

Hello,

Is there a way to display the reference sequence name and not the ID.

Example:

NC_046069.1 . chromosome 1 34253345 . . . ID=NC_046069.1;Name=chr1;

I would like the 'Name' to be searchable and displayed instead of the 'ID' NC_046069.1.

Currently I loaded the FASTA with this header:

NC_046069.1 Petromyzon marinus isolate kPetMar1 chromosome 1, kPetMar1.pri, whole genome shotgun sequence

If I rerun prepare-refseqs.pl with a GFF instead, will the names be used? If not is there some configuration for this?

I have a chado db with the ID stored as the 'uniquename'. I would ideally like to be able to travel between the two using the ID but displaying the Name.

I read over the documentation for prepare-refseqs.pl but am still a bit confused.

Thank you, Sofia

cmdcolin commented 3 years ago

Both NC_046069.1 and chr1 should be searchable if generate-names.pl has been run

To make it display chr1 instead of NC_046069.1, I am reminded of this little hack that specifically modifies the drop down chromosome selector http://gmod.827538.n3.nabble.com/Gmod-ajax-customization-of-the-pulldown-menu-td4052639.html

This change is not included in jbrowse at the moment and would require a custom build with that patch, and then a using the custom refSeqNameTransformer callback to do the translation

This would in the end only change the dropdown menu but it might be sufficient for your purposes?

srobb1 commented 3 years ago

I will try these out and find out if my users are happy.

srobb1 commented 3 years ago

I just tried this bin/generate-names.pl -i --out data/kPetMar1 --tracks DNA, and I am not able to search by 'chr1'. Prior to indexing the names I did rerun prepare-refseqs.pl with a GFF with the ID and Name.

I made sure to clear my cache a few times

cmdcolin commented 3 years ago

You may need to load the track using flatfile-to-json for the GFF, then generate-names.pl, and then remove the track if it is unwanted for those chromosome names to get indexed

srobb1 commented 3 years ago

That worked!

Now let me take a look at that link you posted about the dropdown

srobb1 commented 3 years ago

Question In this code refseq.name returns the the Name from the gff? No, it is the ID, right, because that is what I see in my selector. How do I change refseq.name to my actual ref seq 'Name='?

refSeqNameTransformer = function(refseq) { return refseq.name+" ("+refseq.length/1000000+"Mb)"; }
cmdcolin commented 3 years ago

It would require some hardcoding or creativity to make this happen...this refSeqNameTransformer does not have access to the GFF names but my thoughts were you could make a mapping manually

In the worst case it would look like this

refSeqNameTransformer = function(refseq) { 
  if(refseq.name==='chr1') return 'NC_046069.1';
  else if(...) ...
  } /* make sure the closing brace has a space at the start in the conf format */

Another alternative, still basically hardcoded, is have a JSON object and paste it inside your index.html

<script>
refNameMap = {
    "chr1": "NC_046069.1",
    "chr2": "...",
    ...
}
</script>

And then make the callback

refSeqNameTransformer = function(refseq) { return refNameMap[refseq.name] }
srobb1 commented 3 years ago

The location box doesn't work. if i try chr1:x..y, i get nothing.

Uckk. Should I just rename my FASTA and reload. Not a big deal for just JBrowse, but that means renaming everything in my gene gff and reloading into chado. That is what I really wanted to avoid. I kinda think that is the only way to get what I need. Poop.

cmdcolin commented 3 years ago

dang, I did not think about that issue. I don't think that one is surmountable.

JBrowse 2 does have support for this....it allows multiple refname types to be entered in the location box, etc....still on it's way to being production ready but we did start cutting "beta tagged releases for it"

srobb1 commented 3 years ago

I was supposed to test it out, but the world changed, and that fell through for me, sorry. I am super excited about what I have seen in your demos for JB2, and even more excited knowing that this ID/Name issue might be addressed.

cmdcolin commented 3 years ago

No worries, certainly jbrowse 1 works right now :+1:

For jbrowse 2, we are starting to build up documentation for it here...http://jbrowse.org/jb2/

Hopefully more movement on official release later this year

srobb1 commented 3 years ago

Just an update. I reloaded JBrowse with renamed ref fasta and gff for my tracks. These renamed files use chr\d+ as my refseq names.

In chado via tripal. I loaded a gff of only the genome chromosomes with the following format

NC_046069.1 .   chromosome  1   34253345    .   .   .   ID=NC_046069.1;Name=chr1;

Tripal/chado can handle having both the accession and name.

So no major reloads of chado, i could keep my gff of features with the accession as the ref, and still show the alignment on my gene pages using the name.