etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
540 stars 164 forks source link

Support for vertical heatmap and various plotting refactor #625

Closed tetedange13 closed 3 years ago

tetedange13 commented 3 years ago

Joint work by @tetedange13 and @tskir:

Closes #35. Closes #610.

tskir commented 3 years ago

Thank you Felix for implementing this long wanted feature! I especially like the idea of using a colormap, as opposed to coloring each rectangle separately.

The only concern I have is dropping the desaturate option, which can be quite useful at times. However, I found a way to revive it. In fact, your work has inspired me to do some additional refactor of this module. I'll be submitting a companion PR shortly

tetedange13 commented 3 years ago

Last commit, made from @tskir modifcations should close #610. => If prefered, I can also make a dedicated PR as soon as changes from #632 are merged

Kind regards. Felix.

tetedange13 commented 3 years ago

Hi @tskir ,

I saw you renamed this PR for "no review required" => I do not know if you noticed, but I added a commit on top of your modifications, to implement "sample delimitation"

What do you think, is it better if I close this PR then open another as soon as yours are merged, to add the commits implementing "sample delimitation" ?

Thanks for your answer. Have a nice day. Felix.

tskir commented 3 years ago

@tetedange13 Thank you for pointing this out! I indeed didn't notice that last commit. As all changes from #632 are already included here, I've closed that one. So now this is the only PR left to be reviewed & merged

tetedange13 commented 3 years ago

Hi @etal,

I have not done extensive performance tests and I do not have very huge data (small hybrid-capture panel) => What I can tell you for sure about my part is that, when I first tried to implement "vertical heatmap" keeping up with BrokenBarHCollection(), verticalized plot was quite heavy to plot for Matplotlib (loading all the time, zooming locked...)

Then @tskir's smart implementation of do_desaturate seems more algorithmically efficient (transforming "linspaced values" only, instead of whole dataframe), so I guess it will bring a bit of speed ? https://github.com/etal/cnvkit/blob/cb3e6fd82ff24091c141f1e8ca3647949f12953f/cnvlib/heatmap.py#L162

My pleasure to help ! Have a nice day. Felix.

tetedange13 commented 3 years ago

I noticed cna2df() small function still has a do_desature param which is synced with args.do_desaturate: https://github.com/etal/cnvkit/blob/cb3e6fd82ff24091c141f1e8ca3647949f12953f/cnvlib/heatmap.py#L77-L84 => I guess it was not in @tskir intentions, as "desature" process is kind of done twice now ? => Plus this function can probably be factorized (or deleted), because now it only gets 3 columns from cnarr.data (something like: cna2df = lambda cna: cna.data.loc[:, ['start', 'end', 'log2']])

Do you want me to submit a new separate PR ? => I also wanted to add an optional heatmap --title parameter, so I could do both

Hope this helps. Felix.

etal commented 3 years ago

I see the second do_desaturate, but isn't that just for the legend on the side?

A --title parameter would be nice, to match the scatter command. That could be a new PR if you'd like.

tetedange13 commented 3 years ago

Hum I do not think so, with pcolor() legend colorbar is derived from Matplotlib object returned: https://github.com/etal/cnvkit/blob/a402d901856fa2b72489c42aa5e1b0a975a730f0/cnvlib/heatmap.py#L163-L164

Actually do_desaturate arg is applied in cna2df only to create a new 'color' column https://github.com/etal/cnvkit/blob/a402d901856fa2b72489c42aa5e1b0a975a730f0/cnvlib/heatmap.py#L17 => But as this 'color' column is not used anymore in the code, this transformation step has no impact on final output (but probably a bit on performances) => Maybe we can wait for @tskir's confirmation, but IMO these are useless lines of code

I also tested with above-mentionned cna2df = lambda instead and it works like a charm

OK I will submit a new PR ! Have a nice day. Felix.