dasmoth / dalliance

Interactive web-based genome browser.
http://www.biodalliance.org/
BSD 2-Clause "Simplified" License
226 stars 68 forks source link

Downsampling of reads #213

Open cwuensch opened 7 years ago

cwuensch commented 7 years ago

We are using Biodalliance genome browser with high-coverage bam-files (up to 10,000 reads per base pair). By default, the limit of reads to be displayed is set to 100. (And there has to be a limit, because it gets terribly slow, if not) Problem is, the genome browser seems to just take the first 100 reads then. In a recent case, there was not one singe read displayed for the locus in question, but only reads which started right from the current position. In other cases you may have only wildtype-reads being displayed while the mutated ones get clipped. Could you implement some form of statistical downsampling? e.g. selecting the reads to be displayed per random? Or just taking every 100th read or something like that?

dasmoth commented 7 years ago

There isn't a built in option for this, but if you configure your tracks programatically, you can do this via a plugin.

Something like:

function readDownsampler(featureSets) {
    const reads = featureSets[0];
    const sampledReads = [];
    for (var i = 0; i < reads.length; i += 10) {
         sampledReads.push(reads[i]);
    }
    return sampledReads;
}

...then configure your source with...


{
    name: 'Downsample test',
    bamURI: '/path/to/data.bam',
    merge: readDownsampler
}```

A slight downside is that if you return to almost but not quite the same region of the genome, you'll end up seeing a different subset of the reads.  If this matters, it might be better to sample based on,  e.g., MD5 of the read ID instead.

Having said all that, I think it's a great idea to have something along these lines this built into the core -- so will leave this issue open for now.
cwuensch commented 7 years ago

Thank you for this great solution!! Unfortunately something goes wrong here... When I copy this code exactly as described here, the function readDownsampler never gets called (I inserted some debug log output - which never gets printed). When I write it with brackets, i.e. merge: readDownsampler(), then the function gets called, but featureSets is undefined. What to do about this issue?

cwuensch commented 7 years ago

And another question: Is it possible to access the user-defined variable limit from within this function? With this the downsampling could be adapted to the limit of reads to be shown, as defined by the user in the config dialog.

dasmoth commented 7 years ago

Sorry for the confusion -- the example I sent is something that really ought to work, but currently doesn't because of the way two features (combining multiple data sources in one track, and applying arbitrary filters to data) are coupled together.

The following version is actually tested :-)

{
    name: 'Downsample test',
    overlay: [{bamURI: '/path/to/data.bam'}],
    merge: readDownsampler
}

(The readDownsampler function itself is fine). I'm going to tweak things so that the example as I originally wrote it does actually work -- but might not happen right away.

cwuensch commented 7 years ago

Great! This solution works indeed for filtering the read data.

But... sorry that I have to ask questions again...

We have use this in combination this with (a) a bam index file (bai) (b) a style sheet configuration that enables "Highlight mismatches and strands" by default (c) a readDownsampler() function that considers the user configured read limit

I could not figure out by now, where to correctly place the style sheet information in order to work correctly with the overlay command. When we put it below the merge command, the checkbox "Highlight mismatches and strands" gets checked, but the corresponding style seems not to be applied. Even with manually unchecking and re-checking the checkbox, the style gets not applied. Do you have an idea, how to solve this?

Additionally, is there a possibility to read out the user configured read limit in order to use it in the downsampling-function?

dasmoth commented 7 years ago

Re: mismatch colouring...

Thanks for spotting this. It sounds like your config is fine, but some logic that's used to determine whether reference sequence data needs to be threaded through to a given track's renderer was failing when your custom filter was applied. This has been fixed in the git-latest version.

Re: user-configurability of the the custom filter.

Do you want to be able to configure this at run time (via a custom field in the track editor). Currently no way of doing this, but I'd certainly agree it would be nice!

cwuensch commented 7 years ago

Thanks again!

Re: mismatch colouring... After having built the latest version from git, the mismatch colouring actually works fine. BUT, there has appeared some new issue with the latest version... The "cursor" indicating the middle position gets not correctly positioned after applying the API function SetLocation(). Furthermore, the cursor "jumps" back and forward, when the user opens the configuration dialog. That is kind of ... weird.

Re: custom filter Actually I do not really need to let the user configure the read downsampling limit in the track editor. BUT, there already IS a field "limit" in the track editor, which is pre-configured with 100, and which will be out-of-function, if we cannot read out its value from the downsampling function. Furthermore, the limit of 100 is not very suitable for us. If it is a hard limit that cannot be changed by the user anymore, then it would be nice, if we could at least pre-configure it with a higher value, like 500. Does there exist a style-option for changing this value?

dasmoth commented 7 years ago

I'm concerned about what you say regarding the "cursor" (do you mean the vertical position indicator in the middle of the browser area). Could you send a screenshot or two to illustrate this (offline to thomas.a.down@gmail.com is fine if you prefer).

Regarding the "bumping limit", it can be configured as a top level (not stylesheet) option on a track configuration:

        {
               name: 'my track',
               bamURI: '...',
               subtierMax: 500
         }
cwuensch commented 7 years ago

Thanks for the solution to increase the bumping limit!

Regarding the "cursor": Right, I am talking about the position indicator in the middle of the browsing area.

1.) When I change the position to be displayed via the API SetLocation(), the cursor gets displayed at a wrong location (and I think with a wrong width): 1 - wrongcursor

2.) Opening the configuration panel causes the cursor to jump to the left (seems the panel's width gets subtracted from the browser's width, and the cursor is rendered in the middle of the reduced with. 2 - panelopen

3.) Closing the config panel causes the cursor to be finally displayed at the correct position. 3 - panelclosed