PeeperLab / CopywriteR

DNA copy number detection from off-target sequence data
GNU General Public License v3.0
28 stars 10 forks source link

How to reduce the noise/MAD value? #29

Closed breezyzhao closed 3 years ago

breezyzhao commented 4 years ago

Hi, I am running the CopywriteR to analysis some tissue-normal whole-exome sequencing samples. Everything seems good except for the MAD value (attached). Do you guys happen to know some good methods to reduce the MAD value? all_chrom.pdf

Thank you very much. Best, Xin

PeeperLab2 commented 4 years ago

Dear Xin,

Thank you for your email and using CopywriteR.

There are a few things that could reduce the MAD value: 1) Larger window (50kb for example in stead of the default 20kb). However, this will of course reduce the number of data points. 2) Have you tried subtracting the matched normal from the samples, so taking the ratio between tumor and normal? This might reduce some of the noise.

Having said that, I do think your CNA profile you send looks good. A good segmentation algorithm will show aberrations including small ones in your sample. CopywriteR is the first step in the analysis of Copy number data, your downstream analysis will be just as important.

Good luck and le the know if you have any other questions.

Best Oscar

On 2 Oct 2019, at 18:52, breezyzhao notifications@github.com<mailto:notifications@github.com> wrote:

Hi, I am running the CopywriteR to analysis some tissue-normal whole-exome sequencing samples. Everything seems good except for the MAD value (attached). Do you guys happen to know some good methods to reduce the MAD value? all_chrom.pdfhttps://github.com/PeeperLab/CopywriteR/files/3682264/all_chrom.pdf

Thank you very much. Best, Xin

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/PeeperLab/CopywriteR/issues/29?email_source=notifications&email_token=AB7X5RGLSBO7KKHYJNFFJXDQMTGUPA5CNFSM4I4Y377KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HPF54IA, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB7X5RGRYHOCMXSIPFVP6ZLQMTGUPANCNFSM4I4Y377A.

rbonneville commented 4 years ago

Hello,

We are also encountering very high MAD values (two examples attached), along with extremely noisy plots, utilizing your recommended workflow. hg19, 20kb bins, whole exome data. Thank you!

all_chrom 1.pdf all_chrom 2.pdf

PeeperLab2 commented 4 years ago

Hello,

The high MAD value occurs when too few sequence reads are available per bin. What I always try first is a run with a larger bin size (50kb or 100kb). If this look nice and have enough off-target sequence reads (a table is provided in the log file) I run again with a smaller bin size (20-kb).

But keep in mind that the number of off-target sequence reads needed to make the copy number profiles is highly dependent on the enrichment kit and sequence depth. It works very well for many Exome sequenced samples but not all. For example, the NimbleGen enrichment kits generally produce so few off-target reads that CopywriteR cannot produce proper copy number files from the off-target reads.

I hope this will help you. Feel free to send me an email if you have more questions.

Kind regards, Oscar

On 30 Oct 2019, at 23:30, rbonneville notifications@github.com<mailto:notifications@github.com> wrote:

Hello,

We are also encountering very high MAD values (two examples attached), along with extremely noisy plots, utilizing your recommended workflow. hg19, 20kb bins, whole exome data. Thank you!

all_chrom 1.pdfhttps://github.com/PeeperLab/CopywriteR/files/3791351/all_chrom.1.pdf all_chrom 2.pdfhttps://github.com/PeeperLab/CopywriteR/files/3791357/all_chrom.2.pdf

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/PeeperLab/CopywriteR/issues/29?email_source=notifications&email_token=AB7X5RGN5A3GDB4YIZMNLMTQRIDI3A5CNFSM4I4Y377KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECV7ITY#issuecomment-548140111, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB7X5RGKCYHKPC4JO34JGN3QRIDI3ANCNFSM4I4Y377A.

rbonneville commented 4 years ago

Hello Oscar,

Thank you for your reply. We are noticing substantial noise with very high MAD values even with 100 kb bins. For our sequencing, we are using the IDT xGen Exome Research Panel, supplemented with the xGen CNV Backbone Panel.

all_chrom 3.pdf all_chrom 4.pdf

PeeperLab2 commented 4 years ago

Hello,

I have no experience with this specific enrichment kit and have no clue how many off-target reads will be available for CopywriteR. Furthermore, I do not know if the CNV backbone panel will improve the CopywriteR strategy or leads to the opposite. If specific regions are enriched for to enable CNV detection these will also be filtered out and not used by CopywriteR. In that case I would try to follow the IDT guide to use the CNV backbone and not CopywriteR which is not made to work with an enrichment kit.

Could you provide the CopywriteR log file and maybe run CopywriteR using an even larger bin size (500kb). We should be able to get the percentage of off-target genes from this.

Kind regards, Oscar

On 5 Nov 2019, at 22:28, rbonneville notifications@github.com<mailto:notifications@github.com> wrote:

Hello Oscar,

Thank you for your reply. We are noticing substantial noise with very high MAD values even with 100 kb bins. For our sequencing, we are using the IDT xGen Exome Research Panel, supplemented with the xGen CNV Backbone Panel.

all_chrom 3.pdfhttps://github.com/PeeperLab/CopywriteR/files/3811311/all_chrom.3.pdf all_chrom 4.pdfhttps://github.com/PeeperLab/CopywriteR/files/3811312/all_chrom.4.pdf

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/PeeperLab/CopywriteR/issues/29?email_source=notifications&email_token=AB7X5RGTGNIDQFCHF3GH67TQSHQPTA5CNFSM4I4Y377KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDEML7Q#issuecomment-550028798, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB7X5RHC2VT5QLZW72FXN5DQSHQPTANCNFSM4I4Y377A.

rbonneville commented 4 years ago

Hello, here is the CopywriteR log file (paths edited only to remove identifiers) and a couple representative plots at 500 kb. These plots look somewhat better than previous, however I am unsure what constitute "good" CNV calls and MAD values with CopywriteR.

CopywriteR.log all_chrom 5.pdf all_chrom 6.pdf

cloudbroken commented 4 years ago

The IDT xGen Exome Research Panel advertises high on-target rate. That is the probably the problem.