Closed micknudsen closed 3 years ago
Hi @micknudsen,
Not an author of CNVkit, but you are probably right => I think expected behaviour was (ignoring small plotting corrections):
if -0.5 < y.min():
y_min = -0.5 # No matter the value of "y.min()"
else: # In other words: "y.min() < -0.5"
y_min = y.min()
Just to precise your issue, minimal command to reproduce is: cnvkit.py scatter -c chr4 a_sample.cnr
=> In other words this is happening as soon as scatter
is asked to focus on a specific chromosome / region (no need to have segment, nor b-allele)
=> As you suggested, seems to be fixed by replacing max()
by min()
at line you highlighted
Thanks a lot for reporting this !
Hope this helps. Have a nice day. Felix.
This old issue of mine might be related: #385.
However, I'm not sure this behaviour is unexpected. I think (and @etal can confirm/deny) the rationale was to prevent very negative scores (such as the ones from biallelic deletions) from compressing the Y axis. Indeed, as you can see on your second graph, variations in most of the signals become hard to see now that the range on the Y axis is so wide.
However, I agree that the data shouldn't just disappear like it happens now. The best solution I can think of is to keep the current default threshold (–5), but to clip the data to that threshold rather than hiding it. @etal @tetedange13 @micknudsen what do you think of this proposal?
Maybe one could put a hard lower bound on the y-axis – as it is now – and then indicate segments falling below this with a line (or some other indicator) below the axis? Something along the line (pun intended) in which IGV indicates downsampling of reads by showing a thick, black line above the alignment. This could also be applied to very high values.
@tskir ,
The problem I see with clipping, is cases like @micknudsen has: important number of data located relatively at same coordinates => If clipped, they will turn into a big strange cluster of points at -5
Idea of a line looks cool and not that hard to implement
=> Could be plotted as soon as any data < -5
(absent otherwise) and disabled when user passes --y-min
and/or --y-max
=> Maybe we could aslo add a tiny vertical line on top of horizontal one, to discretely mark coordinates of masked data ?
No matter the choice made, I think we should also add more logging about this => Because for now data simply disappear from the plot, as you said and no warning is raised => In most case it is OK, because data with log_ratio<-5 are often "antitarget" one, with little importance, but can be risky
Added following logging.warning
:
WARNING: Setting default 'y_min=
' so probes are hidden --> Use parameter '--y-min <y.min()>' to see them
1st try:
2nd try:
Lemme look at this a little more.
The intended behavior was:
Aside from the segments, there are almost always lots of spurious single-bin datapoints (probes) and I think it's OK to ignore these in setting the plot limits and focus our attention on the segments.
I agree that showing clipping on the y-axis is much better than failing to plot those segments at all. I'm not sure the rug plot is exactly the right solution. I'd prefer to keep the color the same as the segments, only represent the segments (ignore individual probes, unless the input was only .cnr) and cling to the bottom spine rather than add another h-line. Maybe a rug plot with orange points facing upward, on a black line identical to the plot's bottom border? Or just spans of orange on that line, or a small triangle on the bottom to indicate each clipped segment?
Hi @etal,
Thanks for your feedback ! This makes sense to rather focus on masked segments => Is this the kind of representation you have in mind for masked segments? (small orange triangle at bottom-right):
Corresponding code:
if segments and hidden_seg.sum():
x_hidden = segments.start[hidden_seg] * MB
y_hidden = np.array([y_min] * len(x_hidden)) + 0.2
axis.scatter(x_hidden, y_hidden, marker='^', linewidth=3, snap=False,
color=segment_color)
Have a nice day. Felix.
Yes, that mockup is very close. To ensure it's not mistaken for an ordinary segment, I'd like to avoid rounded corners (so it looks sharp and somewhat alarming), and ideally, overlap the plot's lower spine (higher z-order or something like that, so it obscures the plot boundary itself and is clearly "breaking through" -- if that's possible). To avoid rounding, maybe switch linewidth=3
to linewidth=1
or =0
in the call to axis.scatter
?
Hi @etal,
edgecolor='none'
clip_on=False, zorder=10
(no y-offset needed)Result bellow is still with linewidth=3
(tell me if triangles size LGTY ?)
=> I used "amplicon.{cnr,cns}" from CNVkit test files
Have a nice day. Felix.
Hey Felix, this looks great! If you have a PR for this feature I'd be very happy to merge it.
Hi @etal ,
I updated #645
Have a nice day. Felix.
Great, I've reviewed and merged #645. This feature is now part of CNVkit.
I wonder if this should be
min
instead ofmax
?https://github.com/etal/cnvkit/blob/d4145c5e9b97817c2d625f07b44212c4176b5a1f/cnvlib/scatter.py#L370
With the current default settings, some biallelic deletion are not shown. Here is an example (CNVkit v0.9.9) without
--y-min
specified:and with
--y-min=-10
: