biocore-ntnu / epic

(DEPRECATED) epic: diffuse domain ChIP-Seq caller based on SICER
http://bioepic.readthedocs.io
MIT License
31 stars 6 forks source link

OverflowError for paired-end reads bedpe files #57

Closed yeroslaviz closed 7 years ago

yeroslaviz commented 7 years ago

running the fololwing command epic --treatment wt.ip.0h.1.reads.bedpe,wt.ip.0h.2.reads.bedpe --control wt.input.0h.1.reads.bedpe,wt.input.0h.2.reads.bedpe -cpu 10 --genome sacCer3 -egs 0.961651807729 -cs Sc.R64.1.genome -bw bigWigFIles -sbw sumBigWigFIles -pe I get an error message as such:

# epic --treatment wt.ip.0h.1.reads.bedpe,wt.ip.0h.2.reads.bedpe --control wt.input.0h.1.reads.bedpe,wt.input.0h.2.reads.bedpe -cpu 10 --genome sacCer3 -egs 0.961651807729 -cs Sc.R64.1.genome -bw bigWigFIles -sbw sumBigWigFIles -pe
# epic --treatment wt.ip.0h.1.reads.bedpe,wt.ip.0h.2.reads.bedpe --control wt.input.0h.1.reads.bedpe,wt.input.0h.2.reads.bedpe -cpu 10 --genome sacCer3 -egs 0.961651807729 -cs Sc.R64.1.genome -bw bigWigFIles -sbw sumBigWigFIles -pe # epic_version: 0.1.25, pandas_version: 0.20.2 (File: epic, Log level: INFO, Time: Mon, 17 Jul 2017 16:24:41 )
Binning wt.ip.0h.1.reads.bedpe,wt.ip.0h.2.reads.bedpe (File: run_epic, Log level: INFO, Time: Mon, 17 Jul 2017 16:24:41 )
Binning chromosomes I, II, III, IV, IX, MT, V, VI, VII, VIII, X, XI, XII, XIII, XIV, XV, XVI (File: count_reads_in_windows, Log level: INFO, Time: Mon, 17 Jul 2017 16:24:41 )
Binning wt.input.0h.1.reads.bedpe,wt.input.0h.2.reads.bedpe (File: run_epic, Log level: INFO, Time: Mon, 17 Jul 2017 16:24:41 )
Binning chromosomes I, II, III, IV, IX, MT, V, VI, VII, VIII, X, XI, XII, XIII, XIV, XV, XVI (File: count_reads_in_windows, Log level: INFO, Time: Mon, 17 Jul 2017 16:24:41 )
Merging ChIP and Input data. (File: helper_functions, Log level: INFO, Time: Mon, 17 Jul 2017 16:24:42 )
11690902.0 effective_genome_size (File: compute_background_probabilites, Log level: DEBUG, Time: Mon, 17 Jul 2017 16:24:42 )
200 window size (File: compute_background_probabilites, Log level: DEBUG, Time: Mon, 17 Jul 2017 16:24:42 )
0 total chip count (File: compute_background_probabilites, Log level: DEBUG, Time: Mon, 17 Jul 2017 16:24:42 )
0.0 average_window_readcount (File: compute_background_probabilites, Log level: DEBUG, Time: Mon, 17 Jul 2017 16:24:42 )
1 island_enriched_threshold (File: compute_background_probabilites, Log level: DEBUG, Time: Mon, 17 Jul 2017 16:24:42 )
4.0 gap_contribution (File: compute_background_probabilites, Log level: DEBUG, Time: Mon, 17 Jul 2017 16:24:42 )
1.0 boundary_contribution (File: compute_background_probabilites, Log level: DEBUG, Time: Mon, 17 Jul 2017 16:24:42 )
/usr/local/lib/python2.7/dist-packages/epic/statistics/compute_score_threshold.py:24: **RuntimeWarning: divide by zero encountered in log**
  score = -log(required_p_value)
Traceback (most recent call last):
  File "/usr/local/bin/epic", line 219, in <module>
    run_epic(args)
  File "/usr/local/lib/python2.7/dist-packages/epic/run/run_epic.py", line 50, in run_epic
    compute_background_probabilities(nb_chip_reads, args)
  File "/usr/local/lib/python2.7/dist-packages/epic/statistics/compute_background_probabilites.py", line 51, in compute_background_probabilities
    boundary_contribution, genome_length_in_bins)
  File "/usr/local/lib/python2.7/dist-packages/epic/statistics/compute_score_threshold.py", line 26, in compute_score_threshold
    current_scaled_score = int(round(score / BIN_SIZE))
**OverflowError: cannot convert float infinity to integer**

There is also a warning about divion by zero, which I can't really pin-point.

The projects works with paired-end samples. to make my mapping results works with epic i have sorted the bam files by name, fixed the flags and convert them to bedpe format like that:

samtools sort -@ 8 -n ../oldAnalysis/Mapping.bowtie2/$base/$base.sorted.bam -o $base.reads.sorted
samtools fixmate $base.reads.sorted $base.reads.fixed.sorted
bedtools bamtobed -i $base.reads.fixed.sorted -bedpe > $base.reads.bedpe

the bedpe files looks like that now:


V       550986  551070  V       551119  551205  NB500982:16:HLHKMBGXX:1:11101:1047:9075 42      +       -
IV      425501  425587  IV      425535  425619  NB500982:16:HLHKMBGXX:1:11101:1050:11357        42      +       -
XVI     338212  338298  XVI     338464  338550  NB500982:16:HLHKMBGXX:1:11101:1052:16655        42      +       -
VII     865993  866078  VII     866073  866159  NB500982:16:HLHKMBGXX:1:11101:1053:2697 42      +       -
IX      32309   32395   IX      32462   32548   NB500982:16:HLHKMBGXX:1:11101:1053:9302 42      +       -
VIII    462923  463009  VIII    462991  463077  NB500982:16:HLHKMBGXX:1:11101:1053:14383        42      +       -```

Any ideas on to to analyse the data wit hepic?
Do I really need to convert the bam files of a paired-end data set to bedpe format? 
Maybe this causes the problem.
thanks
Assa
endrebak commented 7 years ago

I'd love to help debug this. Would you be able to share (parts of) the data that reproduces the error on a private dropbox link or something? My e-mail is endrebak85 gmail.com

I require bedpe files for good reason. I should explain it in the docs sometime.

In the data you copy-pasted there are spaces instead of tabs. What do the files look like if you do cat -et <your_bedpe> | head on he bedpe files?

yeroslaviz commented 7 years ago

Thanks for the fast reply. I can give you a link (or two) to download the bedpe files, if you an give me an email address to sent it to.

When doing cat -et wt.ip.0h.1.reads.bedpe | head I get

V^I550986^I551070^IV^I551119^I551205^INB500982:16:HLHKMBGXX:1:11101:1047:9075^I42^I+^I-$
IV^I425501^I425587^IIV^I425535^I425619^INB500982:16:HLHKMBGXX:1:11101:1050:11357^I42^I+^I-$
XVI^I338212^I338298^IXVI^I338464^I338550^INB500982:16:HLHKMBGXX:1:11101:1052:16655^I42^I+^I-$
VII^I865993^I866078^IVII^I866073^I866159^INB500982:16:HLHKMBGXX:1:11101:1053:2697^I42^I+^I-$
IX^I32309^I32395^IIX^I32462^I32548^INB500982:16:HLHKMBGXX:1:11101:1053:9302^I42^I+^I-$
VIII^I462923^I463009^IVIII^I462991^I463077^INB500982:16:HLHKMBGXX:1:11101:1053:14383^I42^I+^I-$
XI^I142565^I142651^IXI^I142738^I142824^INB500982:16:HLHKMBGXX:1:11101:1053:15269^I40^I+^I-$
IV^I241234^I241320^IIV^I241461^I241547^INB500982:16:HLHKMBGXX:1:11101:1057:12441^I42^I+^I-$
X^I276041^I276127^IX^I276066^I276152^INB500982:16:HLHKMBGXX:1:11101:1057:17539^I42^I+^I-$
XV^I727805^I727890^IXV^I727946^I728032^INB500982:16:HLHKMBGXX:1:11101:1060:11989^I42^I+^I-$

The ^I symbols represent AFAIK a tab-character.

thanks Assa

endrebak commented 7 years ago

Yes, then it was a problem with the copy-pasting, not the files themselves :)

endrebak85@gmail.com

I'll try it out today, thanks.

endrebak commented 7 years ago

I also need the genome size file btw :)

yeroslaviz commented 7 years ago

the two bedpe files as well as the genome files are in the sent link

thanks Assa

On 18. Jul 2017, at 09:02, Endre Bakken Stovner notifications@github.com<mailto:notifications@github.com> wrote:

I also need the genome size file btw :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/biocore-ntnu/epic/issues/57#issuecomment-315976324, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIg5Q32WpzueQsdVDlpvyZDZfRh78ZbQks5sPFiKgaJpZM4OaDvr.

endrebak commented 7 years ago

Your problem is that you use a comma to separate the files in the epic invocation while you should use a space. This is interpreted as one file by all shells I know of :) :

wt.ip.0h.1.reads.bedpe,wt.ip.0h.2.reads.bedpe

I'll introduce a test for whether the file exists in the next release.

yeroslaviz commented 7 years ago

Now I'm embarrassed. Thanks. I'll try it again with spaces.

Assa

Sent from my iPhone

Dr. Assa Yeroslaviz Max Planck Institute for Biochemistry Application service, Bioinformatics group Am Klopferspitz 18, 82152 Martinsried Tel: +49 (0)89 8578 2427<tel:+49%2089%208578%202427> Email: yeroslaviz@biochem.mpg.demailto:yeroslaviz@biochem.mpg.de

On 18. Jul 2017, at 11:10, Endre Bakken Stovner notifications@github.com<mailto:notifications@github.com> wrote:

Your problem is that you use a comma to separate the files while you should use a space. This is interpreted as one file by all shells I know of :) :

wt.ip.0h.1.reads.bedpe,wt.ip.0h.2.reads.bedpe

I'll introduce a test for whether the file exists in the next release.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/biocore-ntnu/epic/issues/57#issuecomment-316004777, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIg5Q3UcYnKeSn8wDav7Yro0zWu2JYIzks5sPHZqgaJpZM4OaDvr.

endrebak commented 7 years ago

No problem, some tools use comma :)

endrebak commented 7 years ago

I got an error in the bigwig creation. I'll look at that too. The peak-calling seemed to work though.

endrebak commented 7 years ago

Fixed the error, will upload new epic-version to bioconda within a few hours.

endrebak commented 7 years ago

epic 0.1.27 in bioconda is tested on your data and should work :)

yeroslaviz commented 7 years ago

thanks

already download and testing right now :-)

On 18 Jul 2017, at 14:08, Endre Bakken Stovner notifications@github.com<mailto:notifications@github.com> wrote:

epic 0.1.27 in bioconda is tested on your data and should work :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/biocore-ntnu/epic/issues/57#issuecomment-316044237, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIg5Q-nPip0YW0I5Bvp4WYqgumKYvy1Mks5sPKA3gaJpZM4OaDvr.

——

Assa Yeroslaviz, PhD Application service, Bioinformatics group Max Planck Institute for Biochemistry Am Klopferspitz 18, 82152 Martinsried Germany Tel: +49 89 8578 2427 Email: yeroslaviz@biochem.mpg.demailto:yeroslaviz@biochem.mpg.de

yeroslaviz commented 7 years ago

Hi with the new version (0.1.27) it gives the following error

Traceback (most recent call last):
  File "/usr/local/bin/epic", line 222, in <module>
    run_epic(args)
  File "/usr/local/lib/python2.7/dist-packages/epic/run/run_epic.py", line 67, in run_epic
    write_matrix_files(chip_merged, input_merged, df, args)
  File "/usr/local/lib/python2.7/dist-packages/epic/matrixes/matrixes.py", line 43, in write_matrix_files
    from epic.bigwig.create_bigwigs import create_bigwigs
  File "/usr/local/lib/python2.7/dist-packages/epic/bigwig/create_bigwigs.py", line 8, in <module>
    import pyBigWig
ImportError: No module named pyBigWig

maybe it should be added to the list of dependencies, or it should be installed when epic is being installed.

I have manually installed it now with pip install pyBigWig

endrebak commented 7 years ago

This happened with pip install bioepic right? Thx for reporting.

yeroslaviz commented 7 years ago

yes. I re-install the latest version of epic and re-run the same command again.

It came all the way to Creating Matrixes from count data. (File: matrixes, Log level: ... and than quit with the error.

But it runs all the way through. Just last question (hopefully) - what is the output I get to the STDOUT? it looks like that, but I can't scroll all the way up, as it looks like a lot

XV 976400 979799 5076 6575 200.86121292 1.81763986692 0.0 0.0
XV 1004000 1005399 827 1534 45.7997154363 1.26929372589 1.79779015824e-11 2.50134842209e-11
XV 1010200 1011599 1433 2016 73.8440803434 1.67354596884 1.55710137778e-72 4.31633274646e-72
XV 1013400 1014999 1051 1777 59.1618955182 1.39250663004 1.04278853238e-24 1.82236111878e-24
XV 1016200 1019199 2408 4340 117.975189382 1.3063174747 1.58413038036e-36 3.22396154764e-36
XV 1027600 1028799 1992 2544 67.3205226814 1.84354698465 4.88818478537e-136 1.9647787179e-135
XV 1039000 1043399 4550 6808 229.768378241 1.57352567266 2.96853625264e-178 1.47105204026e-177
XV 1060400 1060999 474 677 22.055479496 1.64843435242 3.38417597889e-24 5.8786346236e-24
XV 1063000 1066399 2770 4794 128.329737842 1.36039091443 5.83959320475e-54 1.41302531225e-53
XV 1076800 1077199 339 594 18.3158595899 1.34367809311 8.82957048912e-08 1.12865622772e-07
XV 1084800 1089399 3645 4961 210.426198394 1.72985734138 2.02166130346e-202 1.10808481292e-201
XV 1090200 1091199 1059 2186 56.449806707 1.14058534163 1.2774934704e-05 1.54430497216e-05

can i somehow divert it to a file without the 1>2&

endrebak commented 7 years ago

Strange. It worked for me without any error on your data. What is the error you got?

You only need to use > to redirect output, but perhaps I should add an explicit --outfile flag.

yeroslaviz commented 7 years ago

No sorry, you misunderstood me. It runs now perfectly with the new version (0.1.27). The error is from version 0.1.25 still. An --outfile or logfile flags would be great.

I still would like to understand the output table I posted above, if possible.

endrebak commented 7 years ago

Sorry.

The columns are:

Chromosome Start End ChIP Input Score Fold_change P FDR

ChIP: number of Chip reads in region Input: nb Input reads in region

Score: Poisson value (not important for end users) FDR: adjusted p value Fold_change: ChIP divided by Input

On Wed, Jul 19, 2017 at 8:51 AM, frymor notifications@github.com wrote:

No sorry, you misunderstood me. It runs now perfectly with the new version (0.1.27). The error is from version 0.1.25 still. An --outfile or logfile flags would be great.

I still would like to understand the output table I posted above, if possible.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/57#issuecomment-316289741, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9I0tTYw8aWyboz92Pn7407lTZ8hOt2ks5sPadlgaJpZM4OaDvr .

yeroslaviz commented 7 years ago

It would be great if you can make both a logfile and an outfile for only the table of the peaks, or at least a quiet option, so that the output won't be written to STDOUT.

endrebak commented 7 years ago

log and outfile flags added in 0.2.0.