Best choice parameters - Githubissues

Hello, I'm using epic 0.2.9 to compare two mapping samples against a mapping reference, and I have some questions for you. I'm looking for enriched regions of unknown size comparing two experimental groups (GROUP1: sample1 and sample2) (GROUP2: reference). But it's essential that no reads of one of the groups map on the enriched sequence. At first, I mapped the 3 samples sequences (sample 1, sample 2 and reference) against the same reference genome using bowtie2. Then I used epic to compare sample1_mapping/reference_mapping and sample1_mapping/reference_mapping. I created a chromosome-size-file.

When I compared the samples with default parametres, I get large enriched regions and logically a lot of reads of both groups mapped on each enriched region.

I extracted a little test_sample of each sample. Speciffically I extracted known enriched region from genome. Then I reproduce the same steps. I make a lot of executions with high and low FDR, proving combinations of windows size and gap allowed through some loops. When I checked the epic results I realized that in sample1/reference comparison, the program return the exactly region.

But in sample2/reference comparison it returns: -One very large region whith high FDR and logically both samples reads mapped in it. -A lot of very short regions with no reads mapped from one of the samples, with very low FDR, but there are a lot of little gaps between them. -No enriched regions. When I graph the coverage map of these comparison I can see a clear enriched region in both comparisons. Here are the graphics: sample1_mapping-against-reference_mapping sample2_mapping-against-reference_mapping The most notable difference between sample1 and sample2 mappings is the coverage deep, how you can see in the graphs. I could test a combination of parametres for tune up the program for the test-files. But in the case of the real samples, I don't think it work because I'm looking for unknown size sequences, from 3 pb to largest possible. FIRST QUESTION: Which combination of parameters do you recommend for that kind of experiment? SECOND QUESTION: In what degree does the coverage deep diference between mappings affect? THIRD QUESTION: In what degree does the definition of the samples as control (-c) or as treatment (-t) affect?

I'm waiting for an answer. Thanks you very much, Jose.

biocore-ntnu / epic

Best choice parameters #78