CollasLab / edd

Enriched Domain Detector for ChIP-seq data
https://pypi.python.org/pypi/edd
MIT License
15 stars 4 forks source link

question about "bin-size" #5

Closed lixin4306ren closed 8 years ago

lixin4306ren commented 8 years ago

hello

I tried to use edd for my data. I got error information as following:

File "/home/jhmi/xinli/.local/lib/python2.7/site-packages/eddlib/estimate.py", line 30, in bin_size assert bin_size < 100, "Could not find a suitable bin size." AssertionError: Could not find a suitable bin size.

Then I tried to specify bin-size by myself, I tried 5kb and got error like this: "AssertionError: The selected bin size results in less informative bins that what specified by theparameter required_fraction_of_informative_bins. Please try a bigger bin size or let EDD auto-estimate a bin size for you."

And tried 50kb and got same issue. What should I do? Thanks.

Xin

eivindgl commented 8 years ago

Hi Xin,

This error is perhaps not very user friendly, my aplogies. As I am sure you know, large parts of the genome consists of repeats and short reads are not unambiguously alignable to these regions. Bins inside such regions are deemed non-informative because there are no reads (no information) within them. EDD requires you to specifically blacklist these regions. This is done with the unalignable regions argument that must be supplied when running EDD. For the hg19, this file can be used https://github.com/CollasLab/edd/blob/master/data/hg19_unalignable_regions.

Which genome and version are you using?

lixin4306ren commented 8 years ago

Thank you for your quick reply.

I'm working on human genome and using hg19. I used a file within the "data" folder of your edd (edd/data/hg19_unalignable_regions.bed) as unaligned region (Is that OK?). I checked log.txt and found the reason of this problem is the fraction of informative bins is always less than the default 0.99, even when using 50kb bin-size. Then I tried change config-file and modified it from 0.99 to 0.98, now it's running. But I don't know whether these is any disadvantage for this. Do you think 0.99 is a pretty strict cutoff for large genome?

Xin

On Thu, Sep 10, 2015 at 4:05 PM, Eivind Gard Lund notifications@github.com wrote:

Hi Xin,

This error is perhaps not very user friendly, my aplogies. As I am sure you know, large parts of the genome consists of repeats and short reads are not unambiguously alignable to these regions. Bins inside such regions are deemed non-informative because there are no reads (no information) within them. EDD requires you to specifically blacklist these regions. This is done with the unalignable regions argument that must be supplied when running EDD. For the hg19, this file can be used https://github.com/CollasLab/edd/blob/master/data/hg19_unalignable_regions .

Which genome and version are you using?

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139365172.

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205

eivindgl commented 8 years ago

Hi,

The 0.99 requirement is pretty strict and 0.98 should not be a problem. The reason this is a requirement at all is because EDD shuffles all the bins in the monte carlo trials. If you have a lot of non-informative bins outside of blacklisted regions, then you'll end up more false positives in your result set. Ideally, only the informative bins should be shuffled and then the user wouldn't have to supply a list blacklisted regions. I haven't had time to implement this feature yet.

Did it work with 5kb bin size as well? 50kb is probably a little large...

all the best, Eivind

On Thu, Sep 10, 2015 at 10:26 PM, lixin4306ren notifications@github.com wrote:

Thank you for your quick reply.

I'm working on human genome and using hg19. I used a file within the "data" folder of your edd (edd/data/hg19_unalignable_regions.bed) as unaligned region (Is that OK?). I checked log.txt and found the reason of this problem is the fraction of informative bins is always less than the default 0.99, even when using 50kb bin-size. Then I tried change config-file and modified it from 0.99 to 0.98, now it's running. But I don't know whether these is any disadvantage for this. Do you think 0.99 is a pretty strict cutoff for large genome?

Xin

On Thu, Sep 10, 2015 at 4:05 PM, Eivind Gard Lund < notifications@github.com> wrote:

Hi Xin,

This error is perhaps not very user friendly, my aplogies. As I am sure you know, large parts of the genome consists of repeats and short reads are not unambiguously alignable to these regions. Bins inside such regions are deemed non-informative because there are no reads (no information) within them. EDD requires you to specifically blacklist these regions. This is done with the unalignable regions argument that must be supplied when running EDD. For the hg19, this file can be used

https://github.com/CollasLab/edd/blob/master/data/hg19_unalignable_regions .

Which genome and version are you using?

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139365172.

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139369683.

lixin4306ren commented 8 years ago

Hi, Eivind

Although EDD successfully started to run, I got another problem about the results. I tried several datasets, EDD always ended without generation output peak calling file (only generated log.txt file). All my jobs aborted after step "[2015-09-11 05:06:38.806143] NOTICE: edd: Running 10000 monte carlo trials". I pasted one example log file here. I have no idea why this occurred. I think I specified enough resource for it. Thank you for your help.

2015-09-11 05:02:53.854973] NOTICE: edd: 2015-09-11 01:02:53.854871 [2015-09-11 05:02:53.855925] NOTICE: edd: cwd: /amber3/feinbergLab/personal/xinli/Oliver/Chip-Seq_new_dataset [2015-09-11 05:02:53.856053] NOTICE: edd: string args: /home/jhmi/xinli/.local/bin/edd --config-file /home/jhmi/xinli/soft/edd/eddlib/default_parameters.conf /home/jhmi/xinli/soft /edd/data/hg19.chromsizes /home/jhmi/xinli/soft/edd/data/hg19_unalignable_regions.bed Sample_13B_K36_2_batch3.sort.rmdup.bam Sample_13B_Input_2_batch3.sort.rmdup.bam /home/jhmi/xi nli/dcl01/Oliver/Chip-Seq/Sample_13B_K36_2_batch3 [2015-09-11 05:02:53.856162] NOTICE: edd: chromosome size file: /home/jhmi/xinli/soft/edd/data/hg19.chromsizes [2015-09-11 05:02:53.856254] NOTICE: edd: IP file: Sample_13B_K36_2_batch3.sort.rmdup.bam [2015-09-11 05:02:53.856346] NOTICE: edd: Input file: Sample_13B_Input_2_batch3.sort.rmdup.bam [2015-09-11 05:02:53.856436] NOTICE: edd: output dir: /home/jhmi/xinli/dcl01/Oliver/Chip-Seq/Sample_13B_K36_2_batch3 [2015-09-11 05:02:53.856531] NOTICE: edd: number of monte carlo trials: 10000 [2015-09-11 05:02:53.856621] NOTICE: edd: number of processes: 4 [2015-09-11 05:02:53.856723] NOTICE: edd: fdr lim: 0.050 [2015-09-11 05:02:53.856812] NOTICE: edd: gap penalty is unspecified, will be auto estimated [2015-09-11 05:02:53.856902] NOTICE: edd: bin size is unspecified, will be auto estimated [2015-09-11 05:02:53.856989] NOTICE: edd: unalignable regions file : /home/jhmi/xinli/soft/edd/data/hg19_unalignable_regions.bed [2015-09-11 05:02:53.857473] NOTICE: edd: Writing log ratios: False [2015-09-11 05:02:53.857586] NOTICE: edd: EDD configuration file parameters: [2015-09-11 05:02:53.857678] NOTICE: edd: ci_method:agresti_coull [2015-09-11 05:02:53.857776] NOTICE: edd: fraq_ibins:0.98 [2015-09-11 05:02:53.857866] NOTICE: edd: log_ratio_bin_size:10000 [2015-09-11 05:02:53.857955] NOTICE: edd: ci_lim:0.25 [2015-09-11 05:02:53.858890] NOTICE: eddlib.experiment: loading bam files [2015-09-11 05:04:19.786412] NOTICE: eddlib.experiment: done [2015-09-11 05:04:20.284976] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:22.113150] NOTICE: eddlib.estimate: testing bin size 1, nib ratio: 0.8996, spearmanr: 0.425 [2015-09-11 05:04:22.475358] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:23.635885] NOTICE: eddlib.estimate: testing bin size 2, nib ratio: 0.6159, spearmanr: 0.784 [2015-09-11 05:04:23.920364] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:24.917783] NOTICE: eddlib.estimate: testing bin size 3, nib ratio: 0.3240, spearmanr: 0.700 [2015-09-11 05:04:25.172317] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:25.987179] NOTICE: eddlib.estimate: testing bin size 4, nib ratio: 0.1781, spearmanr: 0.651 [2015-09-11 05:04:26.224768] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:26.881711] NOTICE: eddlib.estimate: testing bin size 5, nib ratio: 0.1147, spearmanr: 0.635 [2015-09-11 05:04:27.107415] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:27.653201] NOTICE: eddlib.estimate: testing bin size 6, nib ratio: 0.0848, spearmanr: 0.635 [2015-09-11 05:04:27.867727] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:28.315705] NOTICE: eddlib.estimate: testing bin size 7, nib ratio: 0.0656, spearmanr: 0.638 [2015-09-11 05:04:28.525214] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:28.932162] NOTICE: eddlib.estimate: testing bin size 8, nib ratio: 0.0501, spearmanr: 0.640 [2015-09-11 05:04:29.136950] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:29.490023] NOTICE: eddlib.estimate: testing bin size 9, nib ratio: 0.0383, spearmanr: 0.643 [2015-09-11 05:04:29.724374] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:30.049603] NOTICE: eddlib.estimate: testing bin size 10, nib ratio: 0.0301, spearmanr: 0.645 [2015-09-11 05:04:30.248191] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:30.545102] NOTICE: eddlib.estimate: testing bin size 11, nib ratio: 0.0246, spearmanr: 0.647 [2015-09-11 05:04:30.741359] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:31.014689] NOTICE: eddlib.estimate: testing bin size 12, nib ratio: 0.0213, spearmanr: 0.648 [2015-09-11 05:04:31.208867] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:31.461279] NOTICE: eddlib.estimate: testing bin size 13, nib ratio: 0.0192, spearmanr: 0.651 [2015-09-11 05:04:31.461714] NOTICE: eddlib.experiment: Optimal bin size: 13000 [2015-09-11 05:04:31.655788] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:31.815054] NOTICE: eddlib.experiment: Manually specified bin size of 13KB gives 98.08% informative bins. The required amount is 98.00%. [2015-09-11 05:04:31.816198] NOTICE: eddlib.experiment: Estimating gap penalty [2015-09-11 05:04:59.655100] NOTICE: eddlib.algorithm.unalignable_regions: Unalignable regions file read. Got 396 regions. Total coverage: 239.85MB [2015-09-11 05:05:07.541606] NOTICE: eddlib.estimate: Gap penalty of 17.64 gives a score of 0.263 (950 potential peaks with 220.86MB coverage) [2015-09-11 05:05:12.691678] NOTICE: eddlib.estimate: Gap penalty of 10.00 gives a score of 0.283 (1066 potential peaks with 256.81MB coverage) [2015-09-11 05:05:17.980959] NOTICE: eddlib.estimate: Gap penalty of 6.94 gives a score of 0.249 (885 potential peaks with 244.40MB coverage) [2015-09-11 05:05:23.110634] NOTICE: eddlib.estimate: Gap penalty of 12.92 gives a score of 0.271 (986 potential peaks with 235.64MB coverage) [2015-09-11 05:05:28.246855] NOTICE: eddlib.estimate: Gap penalty of 8.83 gives a score of 0.274 (1014 potential peaks with 253.23MB coverage) [2015-09-11 05:05:33.529675] NOTICE: eddlib.estimate: Gap penalty of 11.11 gives a score of 0.268 (977 potential peaks with 239.43MB coverage) [2015-09-11 05:05:38.682235] NOTICE: eddlib.estimate: Gap penalty of 9.55 gives a score of 0.276 (1024 potential peaks with 252.28MB coverage) [2015-09-11 05:05:44.102403] NOTICE: eddlib.estimate: Gap penalty of 10.43 gives a score of 0.267 (972 potential peaks with 240.59MB coverage) [2015-09-11 05:05:49.177625] NOTICE: eddlib.estimate: Gap penalty of 9.83 gives a score of 0.280 (1047 potential peaks with 254.33MB coverage) [2015-09-11 05:05:54.354744] NOTICE: eddlib.estimate: Gap penalty of 10.16 gives a score of 0.259 (927 potential peaks with 234.32MB coverage) [2015-09-11 05:05:59.487704] NOTICE: eddlib.estimate: Gap penalty of 9.93 gives a score of 0.280 (1048 potential peaks with 254.31MB coverage) [2015-09-11 05:06:04.556720] NOTICE: eddlib.estimate: Gap penalty of 10.06 gives a score of 0.269 (979 potential peaks with 243.36MB coverage) [2015-09-11 05:06:09.644396] NOTICE: eddlib.estimate: Gap penalty of 9.98 gives a score of 0.278 (1035 potential peaks with 252.72MB coverage) [2015-09-11 05:06:09.718051] NOTICE: eddlib.experiment: Gap penalty estimated to 10.0 [2015-09-11 05:06:38.220294] NOTICE: eddlib.algorithm.unalignable_regions: Unalignable regions file read. Got 396 regions. Total coverage: 239.85MB [2015-09-11 05:06:38.805636] NOTICE: eddlib.algorithm.max_segments: Removed trivial intervals with score less than 2.9162. [2015-09-11 05:06:38.806018] NOTICE: eddlib.algorithm.max_segments: 3624 intervals (potential peaks) remaining. [2015-09-11 05:06:38.806143] NOTICE: edd: Running 10000 monte carlo trials

On Fri, Sep 11, 2015 at 4:21 AM, Eivind Gard Lund notifications@github.com wrote:

Hi,

The 0.99 requirement is pretty strict and 0.98 should not be a problem. The reason this is a requirement at all is because EDD shuffles all the bins in the monte carlo trials. If you have a lot of non-informative bins outside of blacklisted regions, then you'll end up more false positives in your result set. Ideally, only the informative bins should be shuffled and then the user wouldn't have to supply a list blacklisted regions. I haven't had time to implement this feature yet.

Did it work with 5kb bin size as well? 50kb is probably a little large...

all the best, Eivind

On Thu, Sep 10, 2015 at 10:26 PM, lixin4306ren notifications@github.com wrote:

Thank you for your quick reply.

I'm working on human genome and using hg19. I used a file within the "data" folder of your edd (edd/data/hg19_unalignable_regions.bed) as unaligned region (Is that OK?). I checked log.txt and found the reason of this problem is the fraction of informative bins is always less than the default 0.99, even when using 50kb bin-size. Then I tried change config-file and modified it from 0.99 to 0.98, now it's running. But I don't know whether these is any disadvantage for this. Do you think 0.99 is a pretty strict cutoff for large genome?

Xin

On Thu, Sep 10, 2015 at 4:05 PM, Eivind Gard Lund < notifications@github.com> wrote:

Hi Xin,

This error is perhaps not very user friendly, my aplogies. As I am sure you know, large parts of the genome consists of repeats and short reads are not unambiguously alignable to these regions. Bins inside such regions are deemed non-informative because there are no reads (no information) within them. EDD requires you to specifically blacklist these regions. This is done with the unalignable regions argument that must be supplied when running EDD. For the hg19, this file can be used

https://github.com/CollasLab/edd/blob/master/data/hg19_unalignable_regions

.

Which genome and version are you using?

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139365172.

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139369683.

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139481825.

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205

eivindgl commented 8 years ago

Hi Xin,

Running monte carlo trials takes quite some time. You should run edd with as many processes as you have cores on you computer. For example, if you have 4 cores, run edd with -p 4

Run edd --help for all the options. You could also reduce the number of monte carlo trials. I have never noticed a big difference between 1_000 and 10_000. Use -n 1000 for this.

I hope this helps. Just email me back if there is anything else.

-eivind

On Fri, Sep 11, 2015 at 8:49 PM, lixin4306ren notifications@github.com wrote:

Hi, Eivind

Although EDD successfully started to run, I got another problem about the results. I tried several datasets, EDD always ended without generation output peak calling file (only generated log.txt file). All my jobs aborted after step "[2015-09-11 05:06:38.806143] NOTICE: edd: Running 10000 monte carlo trials". I pasted one example log file here. I have no idea why this occurred. I think I specified enough resource for it. Thank you for your help.

2015-09-11 05:02:53.854973] NOTICE: edd: 2015-09-11 01:02:53.854871 [2015-09-11 05:02:53.855925] NOTICE: edd: cwd: /amber3/feinbergLab/personal/xinli/Oliver/Chip-Seq_new_dataset [2015-09-11 05:02:53.856053] NOTICE: edd: string args: /home/jhmi/xinli/.local/bin/edd --config-file /home/jhmi/xinli/soft/edd/eddlib/default_parameters.conf /home/jhmi/xinli/soft /edd/data/hg19.chromsizes /home/jhmi/xinli/soft/edd/data/hg19_unalignable_regions.bed Sample_13B_K36_2_batch3.sort.rmdup.bam Sample_13B_Input_2_batch3.sort.rmdup.bam /home/jhmi/xi nli/dcl01/Oliver/Chip-Seq/Sample_13B_K36_2_batch3 [2015-09-11 05:02:53.856162] NOTICE: edd: chromosome size file: /home/jhmi/xinli/soft/edd/data/hg19.chromsizes [2015-09-11 05:02:53.856254] NOTICE: edd: IP file: Sample_13B_K36_2_batch3.sort.rmdup.bam [2015-09-11 05:02:53.856346] NOTICE: edd: Input file: Sample_13B_Input_2_batch3.sort.rmdup.bam [2015-09-11 05:02:53.856436] NOTICE: edd: output dir: /home/jhmi/xinli/dcl01/Oliver/Chip-Seq/Sample_13B_K36_2_batch3 [2015-09-11 05:02:53.856531] NOTICE: edd: number of monte carlo trials: 10000 [2015-09-11 05:02:53.856621] NOTICE: edd: number of processes: 4 [2015-09-11 05:02:53.856723] NOTICE: edd: fdr lim: 0.050 [2015-09-11 05:02:53.856812] NOTICE: edd: gap penalty is unspecified, will be auto estimated [2015-09-11 05:02:53.856902] NOTICE: edd: bin size is unspecified, will be auto estimated [2015-09-11 05:02:53.856989] NOTICE: edd: unalignable regions file : /home/jhmi/xinli/soft/edd/data/hg19_unalignable_regions.bed [2015-09-11 05:02:53.857473] NOTICE: edd: Writing log ratios: False [2015-09-11 05:02:53.857586] NOTICE: edd: EDD configuration file parameters: [2015-09-11 05:02:53.857678] NOTICE: edd: ci_method:agresti_coull [2015-09-11 05:02:53.857776] NOTICE: edd: fraq_ibins:0.98 [2015-09-11 05:02:53.857866] NOTICE: edd: log_ratio_bin_size:10000 [2015-09-11 05:02:53.857955] NOTICE: edd: ci_lim:0.25 [2015-09-11 05:02:53.858890] NOTICE: eddlib.experiment: loading bam files [2015-09-11 05:04:19.786412] NOTICE: eddlib.experiment: done [2015-09-11 05:04:20.284976] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:22.113150] NOTICE: eddlib.estimate: testing bin size 1, nib ratio: 0.8996, spearmanr: 0.425 [2015-09-11 05:04:22.475358] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:23.635885] NOTICE: eddlib.estimate: testing bin size 2, nib ratio: 0.6159, spearmanr: 0.784 [2015-09-11 05:04:23.920364] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:24.917783] NOTICE: eddlib.estimate: testing bin size 3, nib ratio: 0.3240, spearmanr: 0.700 [2015-09-11 05:04:25.172317] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:25.987179] NOTICE: eddlib.estimate: testing bin size 4, nib ratio: 0.1781, spearmanr: 0.651 [2015-09-11 05:04:26.224768] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:26.881711] NOTICE: eddlib.estimate: testing bin size 5, nib ratio: 0.1147, spearmanr: 0.635 [2015-09-11 05:04:27.107415] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:27.653201] NOTICE: eddlib.estimate: testing bin size 6, nib ratio: 0.0848, spearmanr: 0.635 [2015-09-11 05:04:27.867727] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:28.315705] NOTICE: eddlib.estimate: testing bin size 7, nib ratio: 0.0656, spearmanr: 0.638 [2015-09-11 05:04:28.525214] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:28.932162] NOTICE: eddlib.estimate: testing bin size 8, nib ratio: 0.0501, spearmanr: 0.640 [2015-09-11 05:04:29.136950] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:29.490023] NOTICE: eddlib.estimate: testing bin size 9, nib ratio: 0.0383, spearmanr: 0.643 [2015-09-11 05:04:29.724374] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:30.049603] NOTICE: eddlib.estimate: testing bin size 10, nib ratio: 0.0301, spearmanr: 0.645 [2015-09-11 05:04:30.248191] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:30.545102] NOTICE: eddlib.estimate: testing bin size 11, nib ratio: 0.0246, spearmanr: 0.647 [2015-09-11 05:04:30.741359] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:31.014689] NOTICE: eddlib.estimate: testing bin size 12, nib ratio: 0.0213, spearmanr: 0.648 [2015-09-11 05:04:31.208867] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:31.461279] NOTICE: eddlib.estimate: testing bin size 13, nib ratio: 0.0192, spearmanr: 0.651 [2015-09-11 05:04:31.461714] NOTICE: eddlib.experiment: Optimal bin size: 13000 [2015-09-11 05:04:31.655788] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:31.815054] NOTICE: eddlib.experiment: Manually specified bin size of 13KB gives 98.08% informative bins. The required amount is 98.00%. [2015-09-11 05:04:31.816198] NOTICE: eddlib.experiment: Estimating gap penalty [2015-09-11 05:04:59.655100] NOTICE: eddlib.algorithm.unalignable_regions: Unalignable regions file read. Got 396 regions. Total coverage: 239.85MB [2015-09-11 05:05:07.541606] NOTICE: eddlib.estimate: Gap penalty of 17.64 gives a score of 0.263 (950 potential peaks with 220.86MB coverage) [2015-09-11 05:05:12.691678] NOTICE: eddlib.estimate: Gap penalty of 10.00 gives a score of 0.283 (1066 potential peaks with 256.81MB coverage) [2015-09-11 05:05:17.980959] NOTICE: eddlib.estimate: Gap penalty of 6.94 gives a score of 0.249 (885 potential peaks with 244.40MB coverage) [2015-09-11 05:05:23.110634] NOTICE: eddlib.estimate: Gap penalty of 12.92 gives a score of 0.271 (986 potential peaks with 235.64MB coverage) [2015-09-11 05:05:28.246855] NOTICE: eddlib.estimate: Gap penalty of 8.83 gives a score of 0.274 (1014 potential peaks with 253.23MB coverage) [2015-09-11 05:05:33.529675] NOTICE: eddlib.estimate: Gap penalty of 11.11 gives a score of 0.268 (977 potential peaks with 239.43MB coverage) [2015-09-11 05:05:38.682235] NOTICE: eddlib.estimate: Gap penalty of 9.55 gives a score of 0.276 (1024 potential peaks with 252.28MB coverage) [2015-09-11 05:05:44.102403] NOTICE: eddlib.estimate: Gap penalty of 10.43 gives a score of 0.267 (972 potential peaks with 240.59MB coverage) [2015-09-11 05:05:49.177625] NOTICE: eddlib.estimate: Gap penalty of 9.83 gives a score of 0.280 (1047 potential peaks with 254.33MB coverage) [2015-09-11 05:05:54.354744] NOTICE: eddlib.estimate: Gap penalty of 10.16 gives a score of 0.259 (927 potential peaks with 234.32MB coverage) [2015-09-11 05:05:59.487704] NOTICE: eddlib.estimate: Gap penalty of 9.93 gives a score of 0.280 (1048 potential peaks with 254.31MB coverage) [2015-09-11 05:06:04.556720] NOTICE: eddlib.estimate: Gap penalty of 10.06 gives a score of 0.269 (979 potential peaks with 243.36MB coverage) [2015-09-11 05:06:09.644396] NOTICE: eddlib.estimate: Gap penalty of 9.98 gives a score of 0.278 (1035 potential peaks with 252.72MB coverage) [2015-09-11 05:06:09.718051] NOTICE: eddlib.experiment: Gap penalty estimated to 10.0 [2015-09-11 05:06:38.220294] NOTICE: eddlib.algorithm.unalignable_regions: Unalignable regions file read. Got 396 regions. Total coverage: 239.85MB [2015-09-11 05:06:38.805636] NOTICE: eddlib.algorithm.max_segments: Removed trivial intervals with score less than 2.9162. [2015-09-11 05:06:38.806018] NOTICE: eddlib.algorithm.max_segments: 3624 intervals (potential peaks) remaining. [2015-09-11 05:06:38.806143] NOTICE: edd: Running 10000 monte carlo trials

On Fri, Sep 11, 2015 at 4:21 AM, Eivind Gard Lund < notifications@github.com>

wrote:

Hi,

The 0.99 requirement is pretty strict and 0.98 should not be a problem. The reason this is a requirement at all is because EDD shuffles all the bins in the monte carlo trials. If you have a lot of non-informative bins outside of blacklisted regions, then you'll end up more false positives in your result set. Ideally, only the informative bins should be shuffled and then the user wouldn't have to supply a list blacklisted regions. I haven't had time to implement this feature yet.

Did it work with 5kb bin size as well? 50kb is probably a little large...

all the best, Eivind

On Thu, Sep 10, 2015 at 10:26 PM, lixin4306ren <notifications@github.com

wrote:

Thank you for your quick reply.

I'm working on human genome and using hg19. I used a file within the "data" folder of your edd (edd/data/hg19_unalignable_regions.bed) as unaligned region (Is that OK?). I checked log.txt and found the reason of this problem is the fraction of informative bins is always less than the default 0.99, even when using 50kb bin-size. Then I tried change config-file and modified it from 0.99 to 0.98, now it's running. But I don't know whether these is any disadvantage for this. Do you think 0.99 is a pretty strict cutoff for large genome?

Xin

On Thu, Sep 10, 2015 at 4:05 PM, Eivind Gard Lund < notifications@github.com> wrote:

Hi Xin,

This error is perhaps not very user friendly, my aplogies. As I am sure you know, large parts of the genome consists of repeats and short reads are not unambiguously alignable to these regions. Bins inside such regions are deemed non-informative because there are no reads (no information) within them. EDD requires you to specifically blacklist these regions. This is done with the unalignable regions argument that must be supplied when running EDD. For the hg19, this file can be used

https://github.com/CollasLab/edd/blob/master/data/hg19_unalignable_regions

.

Which genome and version are you using?

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139365172.

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139369683.

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139481825.

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139625706.

lixin4306ren commented 8 years ago

Hi, Eivind

Thank you. It worked after I used -n 1000 and got the peak calling results. However, I browsed the result in IGV with ratio file and compared to results from RSEG. It seems lots of relatively smaller domains with clear signal were not called by edd. I want to re-run it with changing parameters. Should I change gap penalty or FDR value? Thanks.

Xin

On Fri, Sep 11, 2015 at 2:57 PM, Eivind Gard Lund notifications@github.com wrote:

Hi Xin,

Running monte carlo trials takes quite some time. You should run edd with as many processes as you have cores on you computer. For example, if you have 4 cores, run edd with -p 4

Run edd --help for all the options. You could also reduce the number of monte carlo trials. I have never noticed a big difference between 1_000 and 10_000. Use -n 1000 for this.

I hope this helps. Just email me back if there is anything else.

-eivind

On Fri, Sep 11, 2015 at 8:49 PM, lixin4306ren notifications@github.com wrote:

Hi, Eivind

Although EDD successfully started to run, I got another problem about the results. I tried several datasets, EDD always ended without generation output peak calling file (only generated log.txt file). All my jobs aborted after step "[2015-09-11 05:06:38.806143] NOTICE: edd: Running 10000 monte carlo trials". I pasted one example log file here. I have no idea why this occurred. I think I specified enough resource for it. Thank you for your help.

2015-09-11 05:02:53.854973] NOTICE: edd: 2015-09-11 01:02:53.854871 [2015-09-11 05:02:53.855925] NOTICE: edd: cwd: /amber3/feinbergLab/personal/xinli/Oliver/Chip-Seq_new_dataset [2015-09-11 05:02:53.856053] NOTICE: edd: string args: /home/jhmi/xinli/.local/bin/edd --config-file /home/jhmi/xinli/soft/edd/eddlib/default_parameters.conf /home/jhmi/xinli/soft /edd/data/hg19.chromsizes /home/jhmi/xinli/soft/edd/data/hg19_unalignable_regions.bed Sample_13B_K36_2_batch3.sort.rmdup.bam Sample_13B_Input_2_batch3.sort.rmdup.bam /home/jhmi/xi nli/dcl01/Oliver/Chip-Seq/Sample_13B_K36_2_batch3 [2015-09-11 05:02:53.856162] NOTICE: edd: chromosome size file: /home/jhmi/xinli/soft/edd/data/hg19.chromsizes [2015-09-11 05:02:53.856254] NOTICE: edd: IP file: Sample_13B_K36_2_batch3.sort.rmdup.bam [2015-09-11 05:02:53.856346] NOTICE: edd: Input file: Sample_13B_Input_2_batch3.sort.rmdup.bam [2015-09-11 05:02:53.856436] NOTICE: edd: output dir: /home/jhmi/xinli/dcl01/Oliver/Chip-Seq/Sample_13B_K36_2_batch3 [2015-09-11 05:02:53.856531] NOTICE: edd: number of monte carlo trials: 10000 [2015-09-11 05:02:53.856621] NOTICE: edd: number of processes: 4 [2015-09-11 05:02:53.856723] NOTICE: edd: fdr lim: 0.050 [2015-09-11 05:02:53.856812] NOTICE: edd: gap penalty is unspecified, will be auto estimated [2015-09-11 05:02:53.856902] NOTICE: edd: bin size is unspecified, will be auto estimated [2015-09-11 05:02:53.856989] NOTICE: edd: unalignable regions file : /home/jhmi/xinli/soft/edd/data/hg19_unalignable_regions.bed [2015-09-11 05:02:53.857473] NOTICE: edd: Writing log ratios: False [2015-09-11 05:02:53.857586] NOTICE: edd: EDD configuration file parameters: [2015-09-11 05:02:53.857678] NOTICE: edd: ci_method:agresti_coull [2015-09-11 05:02:53.857776] NOTICE: edd: fraq_ibins:0.98 [2015-09-11 05:02:53.857866] NOTICE: edd: log_ratio_bin_size:10000 [2015-09-11 05:02:53.857955] NOTICE: edd: ci_lim:0.25 [2015-09-11 05:02:53.858890] NOTICE: eddlib.experiment: loading bam files [2015-09-11 05:04:19.786412] NOTICE: eddlib.experiment: done [2015-09-11 05:04:20.284976] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:22.113150] NOTICE: eddlib.estimate: testing bin size 1, nib ratio: 0.8996, spearmanr: 0.425 [2015-09-11 05:04:22.475358] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:23.635885] NOTICE: eddlib.estimate: testing bin size 2, nib ratio: 0.6159, spearmanr: 0.784 [2015-09-11 05:04:23.920364] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:24.917783] NOTICE: eddlib.estimate: testing bin size 3, nib ratio: 0.3240, spearmanr: 0.700 [2015-09-11 05:04:25.172317] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:25.987179] NOTICE: eddlib.estimate: testing bin size 4, nib ratio: 0.1781, spearmanr: 0.651 [2015-09-11 05:04:26.224768] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:26.881711] NOTICE: eddlib.estimate: testing bin size 5, nib ratio: 0.1147, spearmanr: 0.635 [2015-09-11 05:04:27.107415] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:27.653201] NOTICE: eddlib.estimate: testing bin size 6, nib ratio: 0.0848, spearmanr: 0.635 [2015-09-11 05:04:27.867727] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:28.315705] NOTICE: eddlib.estimate: testing bin size 7, nib ratio: 0.0656, spearmanr: 0.638 [2015-09-11 05:04:28.525214] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:28.932162] NOTICE: eddlib.estimate: testing bin size 8, nib ratio: 0.0501, spearmanr: 0.640 [2015-09-11 05:04:29.136950] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:29.490023] NOTICE: eddlib.estimate: testing bin size 9, nib ratio: 0.0383, spearmanr: 0.643 [2015-09-11 05:04:29.724374] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:30.049603] NOTICE: eddlib.estimate: testing bin size 10, nib ratio: 0.0301, spearmanr: 0.645 [2015-09-11 05:04:30.248191] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:30.545102] NOTICE: eddlib.estimate: testing bin size 11, nib ratio: 0.0246, spearmanr: 0.647 [2015-09-11 05:04:30.741359] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:31.014689] NOTICE: eddlib.estimate: testing bin size 12, nib ratio: 0.0213, spearmanr: 0.648 [2015-09-11 05:04:31.208867] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:31.461279] NOTICE: eddlib.estimate: testing bin size 13, nib ratio: 0.0192, spearmanr: 0.651 [2015-09-11 05:04:31.461714] NOTICE: eddlib.experiment: Optimal bin size: 13000 [2015-09-11 05:04:31.655788] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:31.815054] NOTICE: eddlib.experiment: Manually specified bin size of 13KB gives 98.08% informative bins. The required amount is 98.00%. [2015-09-11 05:04:31.816198] NOTICE: eddlib.experiment: Estimating gap penalty [2015-09-11 05:04:59.655100] NOTICE: eddlib.algorithm.unalignable_regions: Unalignable regions file read. Got 396 regions. Total coverage: 239.85MB [2015-09-11 05:05:07.541606] NOTICE: eddlib.estimate: Gap penalty of 17.64 gives a score of 0.263 (950 potential peaks with 220.86MB coverage) [2015-09-11 05:05:12.691678] NOTICE: eddlib.estimate: Gap penalty of 10.00 gives a score of 0.283 (1066 potential peaks with 256.81MB coverage) [2015-09-11 05:05:17.980959] NOTICE: eddlib.estimate: Gap penalty of 6.94 gives a score of 0.249 (885 potential peaks with 244.40MB coverage) [2015-09-11 05:05:23.110634] NOTICE: eddlib.estimate: Gap penalty of 12.92 gives a score of 0.271 (986 potential peaks with 235.64MB coverage) [2015-09-11 05:05:28.246855] NOTICE: eddlib.estimate: Gap penalty of 8.83 gives a score of 0.274 (1014 potential peaks with 253.23MB coverage) [2015-09-11 05:05:33.529675] NOTICE: eddlib.estimate: Gap penalty of 11.11 gives a score of 0.268 (977 potential peaks with 239.43MB coverage) [2015-09-11 05:05:38.682235] NOTICE: eddlib.estimate: Gap penalty of 9.55 gives a score of 0.276 (1024 potential peaks with 252.28MB coverage) [2015-09-11 05:05:44.102403] NOTICE: eddlib.estimate: Gap penalty of 10.43 gives a score of 0.267 (972 potential peaks with 240.59MB coverage) [2015-09-11 05:05:49.177625] NOTICE: eddlib.estimate: Gap penalty of 9.83 gives a score of 0.280 (1047 potential peaks with 254.33MB coverage) [2015-09-11 05:05:54.354744] NOTICE: eddlib.estimate: Gap penalty of 10.16 gives a score of 0.259 (927 potential peaks with 234.32MB coverage) [2015-09-11 05:05:59.487704] NOTICE: eddlib.estimate: Gap penalty of 9.93 gives a score of 0.280 (1048 potential peaks with 254.31MB coverage) [2015-09-11 05:06:04.556720] NOTICE: eddlib.estimate: Gap penalty of 10.06 gives a score of 0.269 (979 potential peaks with 243.36MB coverage) [2015-09-11 05:06:09.644396] NOTICE: eddlib.estimate: Gap penalty of 9.98 gives a score of 0.278 (1035 potential peaks with 252.72MB coverage) [2015-09-11 05:06:09.718051] NOTICE: eddlib.experiment: Gap penalty estimated to 10.0 [2015-09-11 05:06:38.220294] NOTICE: eddlib.algorithm.unalignable_regions: Unalignable regions file read. Got 396 regions. Total coverage: 239.85MB [2015-09-11 05:06:38.805636] NOTICE: eddlib.algorithm.max_segments: Removed trivial intervals with score less than 2.9162. [2015-09-11 05:06:38.806018] NOTICE: eddlib.algorithm.max_segments: 3624 intervals (potential peaks) remaining. [2015-09-11 05:06:38.806143] NOTICE: edd: Running 10000 monte carlo trials

On Fri, Sep 11, 2015 at 4:21 AM, Eivind Gard Lund < notifications@github.com>

wrote:

Hi,

The 0.99 requirement is pretty strict and 0.98 should not be a problem. The reason this is a requirement at all is because EDD shuffles all the bins in the monte carlo trials. If you have a lot of non-informative bins outside of blacklisted regions, then you'll end up more false positives in your result set. Ideally, only the informative bins should be shuffled and then the user wouldn't have to supply a list blacklisted regions. I haven't had time to implement this feature yet.

Did it work with 5kb bin size as well? 50kb is probably a little large...

all the best, Eivind

On Thu, Sep 10, 2015 at 10:26 PM, lixin4306ren < notifications@github.com

wrote:

Thank you for your quick reply.

I'm working on human genome and using hg19. I used a file within the "data" folder of your edd (edd/data/hg19_unalignable_regions.bed) as unaligned region (Is that OK?). I checked log.txt and found the reason of this problem is the fraction of informative bins is always less than the default 0.99, even when using 50kb bin-size. Then I tried change config-file and modified it from 0.99 to 0.98, now it's running. But I don't know whether these is any disadvantage for this. Do you think 0.99 is a pretty strict cutoff for large genome?

Xin

On Thu, Sep 10, 2015 at 4:05 PM, Eivind Gard Lund < notifications@github.com> wrote:

Hi Xin,

This error is perhaps not very user friendly, my aplogies. As I am sure you know, large parts of the genome consists of repeats and short reads are not unambiguously alignable to these regions. Bins inside such regions are deemed non-informative because there are no reads (no information) within them. EDD requires you to specifically blacklist these regions. This is done with the unalignable regions argument that must be supplied when running EDD. For the hg19, this file can be used

https://github.com/CollasLab/edd/blob/master/data/hg19_unalignable_regions

.

Which genome and version are you using?

— Reply to this email directly or view it on GitHub <https://github.com/CollasLab/edd/issues/5#issuecomment-139365172 .

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139369683.

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139481825.

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139625706.

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139629377.

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205

eivindgl commented 8 years ago

Hi Xin,

You could try a smaller bin size, for example 7. But, if the data are reasonably high quality and you think RSEG covers the interesting patterns, then you might also stick with that. RSEG has by design a better resolution than EDD and I think it is a good tool. It works very well for data with a clear enrichment profile. In my experience it calls too many false positives for data with a more diffuse profile.

Best of luck with you analysis, Eivind

On Fri, Sep 11, 2015 at 9:20 PM, lixin4306ren notifications@github.com wrote:

Hi, Eivind

Thank you. It worked after I used -n 1000 and got the peak calling results. However, I browsed the result in IGV with ratio file and compared to results from RSEG. It seems lots of relatively smaller domains with clear signal were not called by edd. I want to re-run it with changing parameters. Should I change gap penalty or FDR value? Thanks.

Xin

On Fri, Sep 11, 2015 at 2:57 PM, Eivind Gard Lund < notifications@github.com>

wrote:

Hi Xin,

Running monte carlo trials takes quite some time. You should run edd with as many processes as you have cores on you computer. For example, if you have 4 cores, run edd with -p 4

Run edd --help for all the options. You could also reduce the number of monte carlo trials. I have never noticed a big difference between 1_000 and 10_000. Use -n 1000 for this.

I hope this helps. Just email me back if there is anything else.

-eivind

On Fri, Sep 11, 2015 at 8:49 PM, lixin4306ren notifications@github.com wrote:

Hi, Eivind

Although EDD successfully started to run, I got another problem about the results. I tried several datasets, EDD always ended without generation output peak calling file (only generated log.txt file). All my jobs aborted after step "[2015-09-11 05:06:38.806143] NOTICE: edd: Running 10000 monte carlo trials". I pasted one example log file here. I have no idea why this occurred. I think I specified enough resource for it. Thank you for your help.

2015-09-11 05:02:53.854973] NOTICE: edd: 2015-09-11 01:02:53.854871 [2015-09-11 05:02:53.855925] NOTICE: edd: cwd: /amber3/feinbergLab/personal/xinli/Oliver/Chip-Seq_new_dataset [2015-09-11 05:02:53.856053] NOTICE: edd: string args: /home/jhmi/xinli/.local/bin/edd --config-file /home/jhmi/xinli/soft/edd/eddlib/default_parameters.conf /home/jhmi/xinli/soft /edd/data/hg19.chromsizes /home/jhmi/xinli/soft/edd/data/hg19_unalignable_regions.bed Sample_13B_K36_2_batch3.sort.rmdup.bam Sample_13B_Input_2_batch3.sort.rmdup.bam /home/jhmi/xi nli/dcl01/Oliver/Chip-Seq/Sample_13B_K36_2_batch3 [2015-09-11 05:02:53.856162] NOTICE: edd: chromosome size file: /home/jhmi/xinli/soft/edd/data/hg19.chromsizes [2015-09-11 05:02:53.856254] NOTICE: edd: IP file: Sample_13B_K36_2_batch3.sort.rmdup.bam [2015-09-11 05:02:53.856346] NOTICE: edd: Input file: Sample_13B_Input_2_batch3.sort.rmdup.bam [2015-09-11 05:02:53.856436] NOTICE: edd: output dir: /home/jhmi/xinli/dcl01/Oliver/Chip-Seq/Sample_13B_K36_2_batch3 [2015-09-11 05:02:53.856531] NOTICE: edd: number of monte carlo trials: 10000 [2015-09-11 05:02:53.856621] NOTICE: edd: number of processes: 4 [2015-09-11 05:02:53.856723] NOTICE: edd: fdr lim: 0.050 [2015-09-11 05:02:53.856812] NOTICE: edd: gap penalty is unspecified, will be auto estimated [2015-09-11 05:02:53.856902] NOTICE: edd: bin size is unspecified, will be auto estimated [2015-09-11 05:02:53.856989] NOTICE: edd: unalignable regions file : /home/jhmi/xinli/soft/edd/data/hg19_unalignable_regions.bed [2015-09-11 05:02:53.857473] NOTICE: edd: Writing log ratios: False [2015-09-11 05:02:53.857586] NOTICE: edd: EDD configuration file parameters: [2015-09-11 05:02:53.857678] NOTICE: edd: ci_method:agresti_coull [2015-09-11 05:02:53.857776] NOTICE: edd: fraq_ibins:0.98 [2015-09-11 05:02:53.857866] NOTICE: edd: log_ratio_bin_size:10000 [2015-09-11 05:02:53.857955] NOTICE: edd: ci_lim:0.25 [2015-09-11 05:02:53.858890] NOTICE: eddlib.experiment: loading bam files [2015-09-11 05:04:19.786412] NOTICE: eddlib.experiment: done [2015-09-11 05:04:20.284976] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:22.113150] NOTICE: eddlib.estimate: testing bin size 1, nib ratio: 0.8996, spearmanr: 0.425 [2015-09-11 05:04:22.475358] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:23.635885] NOTICE: eddlib.estimate: testing bin size 2, nib ratio: 0.6159, spearmanr: 0.784 [2015-09-11 05:04:23.920364] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:24.917783] NOTICE: eddlib.estimate: testing bin size 3, nib ratio: 0.3240, spearmanr: 0.700 [2015-09-11 05:04:25.172317] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:25.987179] NOTICE: eddlib.estimate: testing bin size 4, nib ratio: 0.1781, spearmanr: 0.651 [2015-09-11 05:04:26.224768] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:26.881711] NOTICE: eddlib.estimate: testing bin size 5, nib ratio: 0.1147, spearmanr: 0.635 [2015-09-11 05:04:27.107415] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:27.653201] NOTICE: eddlib.estimate: testing bin size 6, nib ratio: 0.0848, spearmanr: 0.635 [2015-09-11 05:04:27.867727] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:28.315705] NOTICE: eddlib.estimate: testing bin size 7, nib ratio: 0.0656, spearmanr: 0.638 [2015-09-11 05:04:28.525214] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:28.932162] NOTICE: eddlib.estimate: testing bin size 8, nib ratio: 0.0501, spearmanr: 0.640 [2015-09-11 05:04:29.136950] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:29.490023] NOTICE: eddlib.estimate: testing bin size 9, nib ratio: 0.0383, spearmanr: 0.643 [2015-09-11 05:04:29.724374] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:30.049603] NOTICE: eddlib.estimate: testing bin size 10, nib ratio: 0.0301, spearmanr: 0.645 [2015-09-11 05:04:30.248191] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:30.545102] NOTICE: eddlib.estimate: testing bin size 11, nib ratio: 0.0246, spearmanr: 0.647 [2015-09-11 05:04:30.741359] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:31.014689] NOTICE: eddlib.estimate: testing bin size 12, nib ratio: 0.0213, spearmanr: 0.648 [2015-09-11 05:04:31.208867] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:31.461279] NOTICE: eddlib.estimate: testing bin size 13, nib ratio: 0.0192, spearmanr: 0.651 [2015-09-11 05:04:31.461714] NOTICE: eddlib.experiment: Optimal bin size: 13000 [2015-09-11 05:04:31.655788] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:31.815054] NOTICE: eddlib.experiment: Manually specified bin size of 13KB gives 98.08% informative bins. The required amount is 98.00%. [2015-09-11 05:04:31.816198] NOTICE: eddlib.experiment: Estimating gap penalty [2015-09-11 05:04:59.655100] NOTICE: eddlib.algorithm.unalignable_regions: Unalignable regions file read. Got 396 regions. Total coverage: 239.85MB [2015-09-11 05:05:07.541606] NOTICE: eddlib.estimate: Gap penalty of 17.64 gives a score of 0.263 (950 potential peaks with 220.86MB coverage) [2015-09-11 05:05:12.691678] NOTICE: eddlib.estimate: Gap penalty of 10.00 gives a score of 0.283 (1066 potential peaks with 256.81MB coverage) [2015-09-11 05:05:17.980959] NOTICE: eddlib.estimate: Gap penalty of 6.94 gives a score of 0.249 (885 potential peaks with 244.40MB coverage) [2015-09-11 05:05:23.110634] NOTICE: eddlib.estimate: Gap penalty of 12.92 gives a score of 0.271 (986 potential peaks with 235.64MB coverage) [2015-09-11 05:05:28.246855] NOTICE: eddlib.estimate: Gap penalty of 8.83 gives a score of 0.274 (1014 potential peaks with 253.23MB coverage) [2015-09-11 05:05:33.529675] NOTICE: eddlib.estimate: Gap penalty of 11.11 gives a score of 0.268 (977 potential peaks with 239.43MB coverage) [2015-09-11 05:05:38.682235] NOTICE: eddlib.estimate: Gap penalty of 9.55 gives a score of 0.276 (1024 potential peaks with 252.28MB coverage) [2015-09-11 05:05:44.102403] NOTICE: eddlib.estimate: Gap penalty of 10.43 gives a score of 0.267 (972 potential peaks with 240.59MB coverage) [2015-09-11 05:05:49.177625] NOTICE: eddlib.estimate: Gap penalty of 9.83 gives a score of 0.280 (1047 potential peaks with 254.33MB coverage) [2015-09-11 05:05:54.354744] NOTICE: eddlib.estimate: Gap penalty of 10.16 gives a score of 0.259 (927 potential peaks with 234.32MB coverage) [2015-09-11 05:05:59.487704] NOTICE: eddlib.estimate: Gap penalty of 9.93 gives a score of 0.280 (1048 potential peaks with 254.31MB coverage) [2015-09-11 05:06:04.556720] NOTICE: eddlib.estimate: Gap penalty of 10.06 gives a score of 0.269 (979 potential peaks with 243.36MB coverage) [2015-09-11 05:06:09.644396] NOTICE: eddlib.estimate: Gap penalty of 9.98 gives a score of 0.278 (1035 potential peaks with 252.72MB coverage) [2015-09-11 05:06:09.718051] NOTICE: eddlib.experiment: Gap penalty estimated to 10.0 [2015-09-11 05:06:38.220294] NOTICE: eddlib.algorithm.unalignable_regions: Unalignable regions file read. Got 396 regions. Total coverage: 239.85MB [2015-09-11 05:06:38.805636] NOTICE: eddlib.algorithm.max_segments: Removed trivial intervals with score less than 2.9162. [2015-09-11 05:06:38.806018] NOTICE: eddlib.algorithm.max_segments: 3624 intervals (potential peaks) remaining. [2015-09-11 05:06:38.806143] NOTICE: edd: Running 10000 monte carlo trials

On Fri, Sep 11, 2015 at 4:21 AM, Eivind Gard Lund < notifications@github.com>

wrote:

Hi,

The 0.99 requirement is pretty strict and 0.98 should not be a problem. The reason this is a requirement at all is because EDD shuffles all the bins in the monte carlo trials. If you have a lot of non-informative bins outside of blacklisted regions, then you'll end up more false positives in your result set. Ideally, only the informative bins should be shuffled and then the user wouldn't have to supply a list blacklisted regions. I haven't had time to implement this feature yet.

Did it work with 5kb bin size as well? 50kb is probably a little large...

all the best, Eivind

On Thu, Sep 10, 2015 at 10:26 PM, lixin4306ren < notifications@github.com

wrote:

Thank you for your quick reply.

I'm working on human genome and using hg19. I used a file within the "data" folder of your edd (edd/data/hg19_unalignable_regions.bed) as unaligned region (Is that OK?). I checked log.txt and found the reason of this problem is the fraction of informative bins is always less than the default 0.99, even when using 50kb bin-size. Then I tried change config-file and modified it from 0.99 to 0.98, now it's running. But I don't know whether these is any disadvantage for this. Do you think 0.99 is a pretty strict cutoff for large genome?

Xin

On Thu, Sep 10, 2015 at 4:05 PM, Eivind Gard Lund < notifications@github.com> wrote:

Hi Xin,

This error is perhaps not very user friendly, my aplogies. As I am sure you know, large parts of the genome consists of repeats and short reads are not unambiguously alignable to these regions. Bins inside such regions are deemed non-informative because there are no reads (no information) within them. EDD requires you to specifically blacklist these regions. This is done with the unalignable regions argument that must be supplied when running EDD. For the hg19, this file can be used

https://github.com/CollasLab/edd/blob/master/data/hg19_unalignable_regions

.

Which genome and version are you using?

— Reply to this email directly or view it on GitHub < https://github.com/CollasLab/edd/issues/5#issuecomment-139365172 .

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205

— Reply to this email directly or view it on GitHub <https://github.com/CollasLab/edd/issues/5#issuecomment-139369683 .

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139481825.

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139625706.

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139629377.

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139637387.

eivindgl commented 8 years ago

oh, and the gap penalty is important. What was it in your case? try running with 5 and perhaps 10 and see if that helps.

On Fri, Sep 11, 2015 at 9:56 PM, Eivind Gard Lund gardlund@gmail.com wrote:

Hi Xin,

You could try a smaller bin size, for example 7. But, if the data are reasonably high quality and you think RSEG covers the interesting patterns, then you might also stick with that. RSEG has by design a better resolution than EDD and I think it is a good tool. It works very well for data with a clear enrichment profile. In my experience it calls too many false positives for data with a more diffuse profile.

Best of luck with you analysis, Eivind

On Fri, Sep 11, 2015 at 9:20 PM, lixin4306ren notifications@github.com wrote:

Hi, Eivind

Thank you. It worked after I used -n 1000 and got the peak calling results. However, I browsed the result in IGV with ratio file and compared to results from RSEG. It seems lots of relatively smaller domains with clear signal were not called by edd. I want to re-run it with changing parameters. Should I change gap penalty or FDR value? Thanks.

Xin

On Fri, Sep 11, 2015 at 2:57 PM, Eivind Gard Lund < notifications@github.com>

wrote:

Hi Xin,

Running monte carlo trials takes quite some time. You should run edd with as many processes as you have cores on you computer. For example, if you have 4 cores, run edd with -p 4

Run edd --help for all the options. You could also reduce the number of monte carlo trials. I have never noticed a big difference between 1_000 and 10_000. Use -n 1000 for this.

I hope this helps. Just email me back if there is anything else.

-eivind

On Fri, Sep 11, 2015 at 8:49 PM, lixin4306ren <notifications@github.com

wrote:

Hi, Eivind

Although EDD successfully started to run, I got another problem about the results. I tried several datasets, EDD always ended without generation output peak calling file (only generated log.txt file). All my jobs aborted after step "[2015-09-11 05:06:38.806143] NOTICE: edd: Running 10000 monte carlo trials". I pasted one example log file here. I have no idea why this occurred. I think I specified enough resource for it. Thank you for your help.

2015-09-11 05:02:53.854973] NOTICE: edd: 2015-09-11 01:02:53.854871 [2015-09-11 05:02:53.855925] NOTICE: edd: cwd: /amber3/feinbergLab/personal/xinli/Oliver/Chip-Seq_new_dataset [2015-09-11 05:02:53.856053] NOTICE: edd: string args: /home/jhmi/xinli/.local/bin/edd --config-file /home/jhmi/xinli/soft/edd/eddlib/default_parameters.conf /home/jhmi/xinli/soft /edd/data/hg19.chromsizes /home/jhmi/xinli/soft/edd/data/hg19_unalignable_regions.bed Sample_13B_K36_2_batch3.sort.rmdup.bam Sample_13B_Input_2_batch3.sort.rmdup.bam /home/jhmi/xi nli/dcl01/Oliver/Chip-Seq/Sample_13B_K36_2_batch3 [2015-09-11 05:02:53.856162] NOTICE: edd: chromosome size file: /home/jhmi/xinli/soft/edd/data/hg19.chromsizes [2015-09-11 05:02:53.856254] NOTICE: edd: IP file: Sample_13B_K36_2_batch3.sort.rmdup.bam [2015-09-11 05:02:53.856346] NOTICE: edd: Input file: Sample_13B_Input_2_batch3.sort.rmdup.bam [2015-09-11 05:02:53.856436] NOTICE: edd: output dir: /home/jhmi/xinli/dcl01/Oliver/Chip-Seq/Sample_13B_K36_2_batch3 [2015-09-11 05:02:53.856531] NOTICE: edd: number of monte carlo trials: 10000 [2015-09-11 05:02:53.856621] NOTICE: edd: number of processes: 4 [2015-09-11 05:02:53.856723] NOTICE: edd: fdr lim: 0.050 [2015-09-11 05:02:53.856812] NOTICE: edd: gap penalty is unspecified, will be auto estimated [2015-09-11 05:02:53.856902] NOTICE: edd: bin size is unspecified, will be auto estimated [2015-09-11 05:02:53.856989] NOTICE: edd: unalignable regions file : /home/jhmi/xinli/soft/edd/data/hg19_unalignable_regions.bed [2015-09-11 05:02:53.857473] NOTICE: edd: Writing log ratios: False [2015-09-11 05:02:53.857586] NOTICE: edd: EDD configuration file parameters: [2015-09-11 05:02:53.857678] NOTICE: edd: ci_method:agresti_coull [2015-09-11 05:02:53.857776] NOTICE: edd: fraq_ibins:0.98 [2015-09-11 05:02:53.857866] NOTICE: edd: log_ratio_bin_size:10000 [2015-09-11 05:02:53.857955] NOTICE: edd: ci_lim:0.25 [2015-09-11 05:02:53.858890] NOTICE: eddlib.experiment: loading bam files [2015-09-11 05:04:19.786412] NOTICE: eddlib.experiment: done [2015-09-11 05:04:20.284976] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:22.113150] NOTICE: eddlib.estimate: testing bin size 1, nib ratio: 0.8996, spearmanr: 0.425 [2015-09-11 05:04:22.475358] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:23.635885] NOTICE: eddlib.estimate: testing bin size 2, nib ratio: 0.6159, spearmanr: 0.784 [2015-09-11 05:04:23.920364] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:24.917783] NOTICE: eddlib.estimate: testing bin size 3, nib ratio: 0.3240, spearmanr: 0.700 [2015-09-11 05:04:25.172317] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:25.987179] NOTICE: eddlib.estimate: testing bin size 4, nib ratio: 0.1781, spearmanr: 0.651 [2015-09-11 05:04:26.224768] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:26.881711] NOTICE: eddlib.estimate: testing bin size 5, nib ratio: 0.1147, spearmanr: 0.635 [2015-09-11 05:04:27.107415] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:27.653201] NOTICE: eddlib.estimate: testing bin size 6, nib ratio: 0.0848, spearmanr: 0.635 [2015-09-11 05:04:27.867727] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:28.315705] NOTICE: eddlib.estimate: testing bin size 7, nib ratio: 0.0656, spearmanr: 0.638 [2015-09-11 05:04:28.525214] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:28.932162] NOTICE: eddlib.estimate: testing bin size 8, nib ratio: 0.0501, spearmanr: 0.640 [2015-09-11 05:04:29.136950] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:29.490023] NOTICE: eddlib.estimate: testing bin size 9, nib ratio: 0.0383, spearmanr: 0.643 [2015-09-11 05:04:29.724374] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:30.049603] NOTICE: eddlib.estimate: testing bin size 10, nib ratio: 0.0301, spearmanr: 0.645 [2015-09-11 05:04:30.248191] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:30.545102] NOTICE: eddlib.estimate: testing bin size 11, nib ratio: 0.0246, spearmanr: 0.647 [2015-09-11 05:04:30.741359] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:31.014689] NOTICE: eddlib.estimate: testing bin size 12, nib ratio: 0.0213, spearmanr: 0.648 [2015-09-11 05:04:31.208867] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:31.461279] NOTICE: eddlib.estimate: testing bin size 13, nib ratio: 0.0192, spearmanr: 0.651 [2015-09-11 05:04:31.461714] NOTICE: eddlib.experiment: Optimal bin size: 13000 [2015-09-11 05:04:31.655788] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:31.815054] NOTICE: eddlib.experiment: Manually specified bin size of 13KB gives 98.08% informative bins. The required amount is 98.00%. [2015-09-11 05:04:31.816198] NOTICE: eddlib.experiment: Estimating gap penalty [2015-09-11 05:04:59.655100] NOTICE: eddlib.algorithm.unalignable_regions: Unalignable regions file read. Got 396 regions. Total coverage: 239.85MB [2015-09-11 05:05:07.541606] NOTICE: eddlib.estimate: Gap penalty of 17.64 gives a score of 0.263 (950 potential peaks with 220.86MB coverage) [2015-09-11 05:05:12.691678] NOTICE: eddlib.estimate: Gap penalty of 10.00 gives a score of 0.283 (1066 potential peaks with 256.81MB coverage) [2015-09-11 05:05:17.980959] NOTICE: eddlib.estimate: Gap penalty of 6.94 gives a score of 0.249 (885 potential peaks with 244.40MB coverage) [2015-09-11 05:05:23.110634] NOTICE: eddlib.estimate: Gap penalty of 12.92 gives a score of 0.271 (986 potential peaks with 235.64MB coverage) [2015-09-11 05:05:28.246855] NOTICE: eddlib.estimate: Gap penalty of 8.83 gives a score of 0.274 (1014 potential peaks with 253.23MB coverage) [2015-09-11 05:05:33.529675] NOTICE: eddlib.estimate: Gap penalty of 11.11 gives a score of 0.268 (977 potential peaks with 239.43MB coverage) [2015-09-11 05:05:38.682235] NOTICE: eddlib.estimate: Gap penalty of 9.55 gives a score of 0.276 (1024 potential peaks with 252.28MB coverage) [2015-09-11 05:05:44.102403] NOTICE: eddlib.estimate: Gap penalty of 10.43 gives a score of 0.267 (972 potential peaks with 240.59MB coverage) [2015-09-11 05:05:49.177625] NOTICE: eddlib.estimate: Gap penalty of 9.83 gives a score of 0.280 (1047 potential peaks with 254.33MB coverage) [2015-09-11 05:05:54.354744] NOTICE: eddlib.estimate: Gap penalty of 10.16 gives a score of 0.259 (927 potential peaks with 234.32MB coverage) [2015-09-11 05:05:59.487704] NOTICE: eddlib.estimate: Gap penalty of 9.93 gives a score of 0.280 (1048 potential peaks with 254.31MB coverage) [2015-09-11 05:06:04.556720] NOTICE: eddlib.estimate: Gap penalty of 10.06 gives a score of 0.269 (979 potential peaks with 243.36MB coverage) [2015-09-11 05:06:09.644396] NOTICE: eddlib.estimate: Gap penalty of 9.98 gives a score of 0.278 (1035 potential peaks with 252.72MB coverage) [2015-09-11 05:06:09.718051] NOTICE: eddlib.experiment: Gap penalty estimated to 10.0 [2015-09-11 05:06:38.220294] NOTICE: eddlib.algorithm.unalignable_regions: Unalignable regions file read. Got 396 regions. Total coverage: 239.85MB [2015-09-11 05:06:38.805636] NOTICE: eddlib.algorithm.max_segments: Removed trivial intervals with score less than 2.9162. [2015-09-11 05:06:38.806018] NOTICE: eddlib.algorithm.max_segments: 3624 intervals (potential peaks) remaining. [2015-09-11 05:06:38.806143] NOTICE: edd: Running 10000 monte carlo trials

On Fri, Sep 11, 2015 at 4:21 AM, Eivind Gard Lund < notifications@github.com>

wrote:

Hi,

The 0.99 requirement is pretty strict and 0.98 should not be a problem. The reason this is a requirement at all is because EDD shuffles all the bins in the monte carlo trials. If you have a lot of non-informative bins outside of blacklisted regions, then you'll end up more false positives in your result set. Ideally, only the informative bins should be shuffled and then the user wouldn't have to supply a list blacklisted regions. I haven't had time to implement this feature yet.

Did it work with 5kb bin size as well? 50kb is probably a little large...

all the best, Eivind

On Thu, Sep 10, 2015 at 10:26 PM, lixin4306ren < notifications@github.com

wrote:

Thank you for your quick reply.

I'm working on human genome and using hg19. I used a file within the "data" folder of your edd (edd/data/hg19_unalignable_regions.bed) as unaligned region (Is that OK?). I checked log.txt and found the reason of this problem is the fraction of informative bins is always less than the default 0.99, even when using 50kb bin-size. Then I tried change config-file and modified it from 0.99 to 0.98, now it's running. But I don't know whether these is any disadvantage for this. Do you think 0.99 is a pretty strict cutoff for large genome?

Xin

On Thu, Sep 10, 2015 at 4:05 PM, Eivind Gard Lund < notifications@github.com> wrote:

Hi Xin,

This error is perhaps not very user friendly, my aplogies. As I am sure you know, large parts of the genome consists of repeats and short reads are not unambiguously alignable to these regions. Bins inside such regions are deemed non-informative because there are no reads (no information) within them. EDD requires you to specifically blacklist these regions. This is done with the unalignable regions argument that must be supplied when running EDD. For the hg19, this file can be used

https://github.com/CollasLab/edd/blob/master/data/hg19_unalignable_regions

.

Which genome and version are you using?

— Reply to this email directly or view it on GitHub < https://github.com/CollasLab/edd/issues/5#issuecomment-139365172 .

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205

— Reply to this email directly or view it on GitHub <https://github.com/CollasLab/edd/issues/5#issuecomment-139369683 .

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139481825.

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139625706.

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139629377.

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139637387.

lixin4306ren commented 8 years ago

I'm working on several broad histone modifications. RSEG works well for most of them, such as K36me3, K27me3, K9me3. However, it didn't work very well for K9me2 histone modification which is really weak and diffuse. That's why I am trying edd and other software now. Hopefully I can get better results by optimizing parameters. One big problem of RSEG is that its result was significantly affected by depth ratio of chip and input depth. Different ratio of depth of chip and input libraries resulted in totally different results, sometime total failure of peak calling. I'm worried about the stability of RSEG.

Xin

On Fri, Sep 11, 2015 at 3:56 PM, Eivind Gard Lund notifications@github.com wrote:

Hi Xin,

You could try a smaller bin size, for example 7. But, if the data are reasonably high quality and you think RSEG covers the interesting patterns, then you might also stick with that. RSEG has by design a better resolution than EDD and I think it is a good tool. It works very well for data with a clear enrichment profile. In my experience it calls too many false positives for data with a more diffuse profile.

Best of luck with you analysis, Eivind

On Fri, Sep 11, 2015 at 9:20 PM, lixin4306ren notifications@github.com

wrote:

Hi, Eivind

Thank you. It worked after I used -n 1000 and got the peak calling results. However, I browsed the result in IGV with ratio file and compared to results from RSEG. It seems lots of relatively smaller domains with clear signal were not called by edd. I want to re-run it with changing parameters. Should I change gap penalty or FDR value? Thanks.

Xin

On Fri, Sep 11, 2015 at 2:57 PM, Eivind Gard Lund < notifications@github.com>

wrote:

Hi Xin,

Running monte carlo trials takes quite some time. You should run edd with as many processes as you have cores on you computer. For example, if you have 4 cores, run edd with -p 4

Run edd --help for all the options. You could also reduce the number of monte carlo trials. I have never noticed a big difference between 1_000 and 10_000. Use -n 1000 for this.

I hope this helps. Just email me back if there is anything else.

-eivind

On Fri, Sep 11, 2015 at 8:49 PM, lixin4306ren < notifications@github.com> wrote:

Hi, Eivind

Although EDD successfully started to run, I got another problem about the results. I tried several datasets, EDD always ended without generation output peak calling file (only generated log.txt file). All my jobs aborted after step "[2015-09-11 05:06:38.806143] NOTICE: edd: Running 10000 monte carlo trials". I pasted one example log file here. I have no idea why this occurred. I think I specified enough resource for it. Thank you for your help.

2015-09-11 05:02:53.854973] NOTICE: edd: 2015-09-11 01:02:53.854871 [2015-09-11 05:02:53.855925] NOTICE: edd: cwd: /amber3/feinbergLab/personal/xinli/Oliver/Chip-Seq_new_dataset [2015-09-11 05:02:53.856053] NOTICE: edd: string args: /home/jhmi/xinli/.local/bin/edd --config-file /home/jhmi/xinli/soft/edd/eddlib/default_parameters.conf /home/jhmi/xinli/soft /edd/data/hg19.chromsizes /home/jhmi/xinli/soft/edd/data/hg19_unalignable_regions.bed Sample_13B_K36_2_batch3.sort.rmdup.bam Sample_13B_Input_2_batch3.sort.rmdup.bam /home/jhmi/xi nli/dcl01/Oliver/Chip-Seq/Sample_13B_K36_2_batch3 [2015-09-11 05:02:53.856162] NOTICE: edd: chromosome size file: /home/jhmi/xinli/soft/edd/data/hg19.chromsizes [2015-09-11 05:02:53.856254] NOTICE: edd: IP file: Sample_13B_K36_2_batch3.sort.rmdup.bam [2015-09-11 05:02:53.856346] NOTICE: edd: Input file: Sample_13B_Input_2_batch3.sort.rmdup.bam [2015-09-11 05:02:53.856436] NOTICE: edd: output dir: /home/jhmi/xinli/dcl01/Oliver/Chip-Seq/Sample_13B_K36_2_batch3 [2015-09-11 05:02:53.856531] NOTICE: edd: number of monte carlo trials: 10000 [2015-09-11 05:02:53.856621] NOTICE: edd: number of processes: 4 [2015-09-11 05:02:53.856723] NOTICE: edd: fdr lim: 0.050 [2015-09-11 05:02:53.856812] NOTICE: edd: gap penalty is unspecified, will be auto estimated [2015-09-11 05:02:53.856902] NOTICE: edd: bin size is unspecified, will be auto estimated [2015-09-11 05:02:53.856989] NOTICE: edd: unalignable regions file : /home/jhmi/xinli/soft/edd/data/hg19_unalignable_regions.bed [2015-09-11 05:02:53.857473] NOTICE: edd: Writing log ratios: False [2015-09-11 05:02:53.857586] NOTICE: edd: EDD configuration file parameters: [2015-09-11 05:02:53.857678] NOTICE: edd: ci_method:agresti_coull [2015-09-11 05:02:53.857776] NOTICE: edd: fraq_ibins:0.98 [2015-09-11 05:02:53.857866] NOTICE: edd: log_ratio_bin_size:10000 [2015-09-11 05:02:53.857955] NOTICE: edd: ci_lim:0.25 [2015-09-11 05:02:53.858890] NOTICE: eddlib.experiment: loading bam files [2015-09-11 05:04:19.786412] NOTICE: eddlib.experiment: done [2015-09-11 05:04:20.284976] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:22.113150] NOTICE: eddlib.estimate: testing bin size 1, nib ratio: 0.8996, spearmanr: 0.425 [2015-09-11 05:04:22.475358] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:23.635885] NOTICE: eddlib.estimate: testing bin size 2, nib ratio: 0.6159, spearmanr: 0.784 [2015-09-11 05:04:23.920364] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:24.917783] NOTICE: eddlib.estimate: testing bin size 3, nib ratio: 0.3240, spearmanr: 0.700 [2015-09-11 05:04:25.172317] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:25.987179] NOTICE: eddlib.estimate: testing bin size 4, nib ratio: 0.1781, spearmanr: 0.651 [2015-09-11 05:04:26.224768] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:26.881711] NOTICE: eddlib.estimate: testing bin size 5, nib ratio: 0.1147, spearmanr: 0.635 [2015-09-11 05:04:27.107415] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:27.653201] NOTICE: eddlib.estimate: testing bin size 6, nib ratio: 0.0848, spearmanr: 0.635 [2015-09-11 05:04:27.867727] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:28.315705] NOTICE: eddlib.estimate: testing bin size 7, nib ratio: 0.0656, spearmanr: 0.638 [2015-09-11 05:04:28.525214] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:28.932162] NOTICE: eddlib.estimate: testing bin size 8, nib ratio: 0.0501, spearmanr: 0.640 [2015-09-11 05:04:29.136950] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:29.490023] NOTICE: eddlib.estimate: testing bin size 9, nib ratio: 0.0383, spearmanr: 0.643 [2015-09-11 05:04:29.724374] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:30.049603] NOTICE: eddlib.estimate: testing bin size 10, nib ratio: 0.0301, spearmanr: 0.645 [2015-09-11 05:04:30.248191] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:30.545102] NOTICE: eddlib.estimate: testing bin size 11, nib ratio: 0.0246, spearmanr: 0.647 [2015-09-11 05:04:30.741359] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:31.014689] NOTICE: eddlib.estimate: testing bin size 12, nib ratio: 0.0213, spearmanr: 0.648 [2015-09-11 05:04:31.208867] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:31.461279] NOTICE: eddlib.estimate: testing bin size 13, nib ratio: 0.0192, spearmanr: 0.651 [2015-09-11 05:04:31.461714] NOTICE: eddlib.experiment: Optimal bin size: 13000 [2015-09-11 05:04:31.655788] NOTICE: eddlib.experiment: normalizing input with scale factor: 1.16 [2015-09-11 05:04:31.815054] NOTICE: eddlib.experiment: Manually specified bin size of 13KB gives 98.08% informative bins. The required amount is 98.00%. [2015-09-11 05:04:31.816198] NOTICE: eddlib.experiment: Estimating gap penalty [2015-09-11 05:04:59.655100] NOTICE: eddlib.algorithm.unalignable_regions: Unalignable regions file read. Got 396 regions. Total coverage: 239.85MB [2015-09-11 05:05:07.541606] NOTICE: eddlib.estimate: Gap penalty of 17.64 gives a score of 0.263 (950 potential peaks with 220.86MB coverage) [2015-09-11 05:05:12.691678] NOTICE: eddlib.estimate: Gap penalty of 10.00 gives a score of 0.283 (1066 potential peaks with 256.81MB coverage) [2015-09-11 05:05:17.980959] NOTICE: eddlib.estimate: Gap penalty of 6.94 gives a score of 0.249 (885 potential peaks with 244.40MB coverage) [2015-09-11 05:05:23.110634] NOTICE: eddlib.estimate: Gap penalty of 12.92 gives a score of 0.271 (986 potential peaks with 235.64MB coverage) [2015-09-11 05:05:28.246855] NOTICE: eddlib.estimate: Gap penalty of 8.83 gives a score of 0.274 (1014 potential peaks with 253.23MB coverage) [2015-09-11 05:05:33.529675] NOTICE: eddlib.estimate: Gap penalty of 11.11 gives a score of 0.268 (977 potential peaks with 239.43MB coverage) [2015-09-11 05:05:38.682235] NOTICE: eddlib.estimate: Gap penalty of 9.55 gives a score of 0.276 (1024 potential peaks with 252.28MB coverage) [2015-09-11 05:05:44.102403] NOTICE: eddlib.estimate: Gap penalty of 10.43 gives a score of 0.267 (972 potential peaks with 240.59MB coverage) [2015-09-11 05:05:49.177625] NOTICE: eddlib.estimate: Gap penalty of 9.83 gives a score of 0.280 (1047 potential peaks with 254.33MB coverage) [2015-09-11 05:05:54.354744] NOTICE: eddlib.estimate: Gap penalty of 10.16 gives a score of 0.259 (927 potential peaks with 234.32MB coverage) [2015-09-11 05:05:59.487704] NOTICE: eddlib.estimate: Gap penalty of 9.93 gives a score of 0.280 (1048 potential peaks with 254.31MB coverage) [2015-09-11 05:06:04.556720] NOTICE: eddlib.estimate: Gap penalty of 10.06 gives a score of 0.269 (979 potential peaks with 243.36MB coverage) [2015-09-11 05:06:09.644396] NOTICE: eddlib.estimate: Gap penalty of 9.98 gives a score of 0.278 (1035 potential peaks with 252.72MB coverage) [2015-09-11 05:06:09.718051] NOTICE: eddlib.experiment: Gap penalty estimated to 10.0 [2015-09-11 05:06:38.220294] NOTICE: eddlib.algorithm.unalignable_regions: Unalignable regions file read. Got 396 regions. Total coverage: 239.85MB [2015-09-11 05:06:38.805636] NOTICE: eddlib.algorithm.max_segments: Removed trivial intervals with score less than 2.9162. [2015-09-11 05:06:38.806018] NOTICE: eddlib.algorithm.max_segments: 3624 intervals (potential peaks) remaining. [2015-09-11 05:06:38.806143] NOTICE: edd: Running 10000 monte carlo trials

On Fri, Sep 11, 2015 at 4:21 AM, Eivind Gard Lund < notifications@github.com>

wrote:

Hi,

The 0.99 requirement is pretty strict and 0.98 should not be a problem. The reason this is a requirement at all is because EDD shuffles all the bins in the monte carlo trials. If you have a lot of non-informative bins outside of blacklisted regions, then you'll end up more false positives in your result set. Ideally, only the informative bins should be shuffled and then the user wouldn't have to supply a list blacklisted regions. I haven't had time to implement this feature yet.

Did it work with 5kb bin size as well? 50kb is probably a little large...

all the best, Eivind

On Thu, Sep 10, 2015 at 10:26 PM, lixin4306ren < notifications@github.com

wrote:

Thank you for your quick reply.

I'm working on human genome and using hg19. I used a file within the "data" folder of your edd (edd/data/hg19_unalignable_regions.bed) as unaligned region (Is that OK?). I checked log.txt and found the reason of this problem is the fraction of informative bins is always less than the default 0.99, even when using 50kb bin-size. Then I tried change config-file and modified it from 0.99 to 0.98, now it's running. But I don't know whether these is any disadvantage for this. Do you think 0.99 is a pretty strict cutoff for large genome?

Xin

On Thu, Sep 10, 2015 at 4:05 PM, Eivind Gard Lund < notifications@github.com> wrote:

Hi Xin,

This error is perhaps not very user friendly, my aplogies. As I am sure you know, large parts of the genome consists of repeats and short reads are not unambiguously alignable to these regions. Bins inside such regions are deemed non-informative because there are no reads (no information) within them. EDD requires you to specifically blacklist these regions. This is done with the unalignable regions argument that must be supplied when running EDD. For the hg19, this file can be used

https://github.com/CollasLab/edd/blob/master/data/hg19_unalignable_regions

.

Which genome and version are you using?

— Reply to this email directly or view it on GitHub < https://github.com/CollasLab/edd/issues/5#issuecomment-139365172 .

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205

— Reply to this email directly or view it on GitHub < https://github.com/CollasLab/edd/issues/5#issuecomment-139369683 .

— Reply to this email directly or view it on GitHub <https://github.com/CollasLab/edd/issues/5#issuecomment-139481825 .

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139625706.

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139629377.

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139637387.

— Reply to this email directly or view it on GitHub https://github.com/CollasLab/edd/issues/5#issuecomment-139647217.

Xin Li Postdoc Fellow Center for Epigenetics, Johns Hopkins University School of Medicine Rangos 580, 855 N. Wolfe St., Baltimore, MD 21205