"Error loading track data" in Jupyter notebook when trying to load local BAM file

enushi commented 3 years ago

I am trying to visualise the reads coverage of a genome. I run the following code which is supposed to do that, but I get "Error loading track data" when loading the track:

In the console I can see the following error:

I know that the file is neither missing nor corrupt because I can see and use it from IGV desktop application and it works fine from there. Some intuition tells me that it might be because the bam file is large (~600Mb) but I cannot confirm that 100%. When I run the BamFIles.ipynb notebook that is provided in the examples folder it works perfectly, can you suggest me something to make it work? Thank you!

jrobinso commented 3 years ago

I will hazard some guesses, but my Jupyter knowledge has long ago decayed to a tiny kernel, if it were ever more than that. @tmtabor is the local expert he might have some ideas.

I suspect the path to the file is the problem. igv.js does not actually read local files, as a web client it can't, it reads files by URL from an http server. Of course Jupyter is running an http server, and there is a way to serve your own files using something (IIRC) called "files" magic. I think that is what you are doing, but something is amiss. One way to get better information is to use developer tools and examine the network traffic, you can then see 2 things (1) what is the URL that is actually used to access your files, and (2) what is the response from the server (Jupyter in this case) to those requests.

@tmtabor Any other ideas?

tmtabor commented 3 years ago

The problem here is most likely the URL to the file.

I would first double check that you can paste the URL to the file into your browser and have it prompt you to begin a download of it. If you can't download it, then neither can igv.js. I suspect that what you'll see when you paste the URL into your browser is Jupyter displaying a page with some sort of error message.

enushi commented 3 years ago

Thank you for the reply. Reading around the net I got more knowledgable about the issue that reading BAM files causes, so I put the BAM files in the apache server and set the configurations of the server as it is described here https://github.com/igvteam/igv.js/wiki/Data-Server-Requirements. My Jupyter file now looks like the following:

When I put the urls specified in there the downloading of the files starts so I suppose the path is correct. However I keep getting the following errors in the console:

I also tried the igv.js library trying to make it work in javascript. I still get an empty track but to my surprise there is no error in the console:

And here is my javascript:

I am really confused on what might be going wrong here as I also said in the beggining the bam files work fine when I use them in the igv desktop application:

jrobinso commented 3 years ago

Not sure, but the fasta files in your javascript config are not going to work unless those are valid URLs , they look like file paths. Can you load the bam into igv desktop using those URLs? (Try File > Load from URL and enter http://localhost/BAM/2G.bam).

Rather than setup apache which can be complex, I think you would have a better chance of success by getting your Jupyter "files" magic paths correct. Did you look in dev tools to see what the actual URLs were resolving to (using Jupyter) as we suggested?

enushi commented 3 years ago

I just tried loading the bam files into igv desktop via the url in the server (e.g. http://localhost/2B.bam) and it works, I can see the reads as before. The location of the fasta file is not in the apache server but just to make sure that loading from the server works I put it in the server and it still works as before. The only problem seems to be only with the .bam files in Jupyter and in the js code.

I also tried with the "files" magic path, and it seems be correct when I go to developers tools and networks it shows "http://localhost:8888/files/Downloads/igv-jupyter-master/examples/data/2B.bam" which is the real place where I have put the file, however when I put that url to the browser I get:

You can find my bam file here: https://drive.google.com/file/d/180sDc_vDnzS8sVt3i-nPut6KSeR-nbzS/view?usp=sharing

jrobinso commented 3 years ago

Sorry I'm out of ideas. I'm sure your bam file is o.k.

enushi commented 3 years ago

I changed the load track section by setting the "indexed" to "True" because previously I had it "False" in Jupyter notebook, now I see no error at all in the console only some warnings

b.load_track( { "name": "2B", "url": "http://localhost/BAM/2B.bam", "indexURL": "http://localhost/BAM/2B.bam.bai", "format": "bam", "type": "alignment", "indexed": True, })

Still the bam track appears empty, I don't know what to do at this point, it seems I have exhausted all my options to make it work :/

jrobinso commented 3 years ago

You can ignore the warnings, those are harmless. I assume you are looking at the same location that alignments are known to exist from IGV desktop viewing. If you can make your bam available, and point me to the correct fasta to use (or make it available as well) I will look into it. I tried clicking the google share link above but it tells me the file does not exist.

enushi commented 3 years ago

Thank you for finding the time to help me, here are the files I am using:

bam file: https://drive.google.com/file/d/1K9zK7PlFB6zVH6gRK9aFnzID9qFwPun4/view bam index file: https://drive.google.com/file/d/1YhETRWBHm0WeoBKic2OKaDhp8p5zR-UR/view genome: https://drive.google.com/file/d/1KupeKmYN5CTwPJgyPIrh9IV2JmkeRFvQ/view?usp=sharing genome index which I am not sure how much is needed but it is this: https://drive.google.com/file/d/1MM-i3CxP7BsFCcF367fBIXDWp-EhuWYd/view?usp=sharing

This is a region with coverage: "NC_009697.1:3,545,589-3,546,235" even though there is coverage almost everywhere

jrobinso commented 3 years ago

Hi, I haven't tried to use the files yet but I think I see the problem. There is a sequence name mismatch between the fasta and the alignment file. The sequence name in the BAM is NC_009697, the sequence name in the fasta is NC_009697.1. IGV desktop does some extra "guessing" when there are mismatches, igv.js uses strict name matching (as does most bioinformatics tools). You can fix this by changing the sequence name in your fasta to match the name in the bam (remove the ".1" at the end). I think you can just rename it in the fasta index as well, the safer thing to do would be to recreate the index but just renaming should work (its just a plain text file). If this does not resolve the issue let me know.

enushi commented 3 years ago

It worked! Thank you very much! Wohoo ... :D

enushi commented 3 years ago

Can I please also ask you how did you manage to get the sequence name in the BAM, when I open the BAM file I just get a sequence of hexadecimal values?

jrobinso commented 3 years ago

You need to use samtools

samtools view -H filename.bam

On Wed, Jan 13, 2021 at 5:41 PM Elio Nushi notifications@github.com wrote:

Can I please also ask you how did you manage to get the sequence name in the BAM, when I open the BAM file I just get a sequence of hexadecimal values?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv-jupyter/issues/40#issuecomment-759868411, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHD2HHPF22DPIYFJZA4OILSZZDWLANCNFSM4VY4FQMQ .

g2nb / igv-jupyter

"Error loading track data" in Jupyter notebook when trying to load local BAM file #40