Different behaviour scatter shell shorthand

lvree commented 1 month ago

I'm very new to CNV analysis and run across something I don't understand. As stated in the manual for the scatter command there are two options:

I assumed they would produce the same output, as it was only a shorthand.

But the first command produces this plot:

And the second produces this:

What does explain this difference in behaviour, or am I misunderstanding something? And can someone explain what the difference in the plots means? Like, why does one have sort of lines and the other has a lot of points? Also, why does the first have it lowest point around -2.5 and the second below -3? I didn't do any manual scaling of the y-axis here yet, so this is just standard output

Then also another thing when I zoomed in on one chromosome the lowest point visible is at -4:

But for the standard plot of the whole genome that isn't even in there, but when I lower the min size of the Y axis it is indeed in there:

Why does it cut the graph off when there are still points there? Is it because there are no orange points there?

Thank you very much!

lvree commented 1 month ago

Aah I'm sorry I see accidentally ran the call.cns file instead of the .cns file. But the output I got when using call.cns was the same as when I ran this part on my dataset: cnvkit.py batch Tumor.bam --normal Normal.bam \ --targets my_baits.bed --annotate refFlat.txt \ --fasta hg19.fasta --access data/access-5kb-mappable.hg19.bed \ --output-reference my_reference.cnn --output-dir results/ \ --diagram --scatter

Further it says this: So why is the batch command then using call.cns instead of .cns as is stated?

etal commented 1 week ago

The file .call.cns is based on the first .cns, with some additional processing:

Absolute copy number is inferred, and certain segments are marked as having neutral copy number. These segments are plotted in gray instead of orange, so that the non-neutral segments are more distinct.
The confidence interval of each segment's mean is checked for overlap with neutral copy number (log2 ratio 0.0); if it overlaps, the segment is merged with any neighboring segments that also have neutral copy number. (Something like that.) So the .call.cns may also have fewer segment breakpoints, and the remaining segments may have slightly different mean log2 ratio values.

etal / cnvkit

Different behaviour scatter shell shorthand #923