biocore-ntnu / epic

(DEPRECATED) epic: diffuse domain ChIP-Seq caller based on SICER
http://bioepic.readthedocs.io
MIT License
31 stars 6 forks source link

OverflowError: Cannot convert float infinity to integer #38

Closed c-guzman closed 7 years ago

c-guzman commented 7 years ago

Command used:

Command executed:

  epic --treatment Control2_treatment.bed.gz --control Control2_control.bed.gz --genome hg19 --fragment-size 150 > Control2_epic.bed

Resulting error:

Command error:
  # epic --treatment Control2_treatment.bed.gz --control Control2_control.bed.gz --genome hg19 --fragment-size 150 (File: epic, Log level: INFO, Time: Thu, 08 Sep 2016 11:56:47 )

  gzip: stdout: Broken pipe
  Used first 10000 reads of Control2_treatment.bed.gz to estimate a median read length of 34.0
  Mean readlength: 33.3074, max readlength: 37, min readlength: 20. (File: find_readlength, Log level: INFO, Time: Thu, 08 Sep 2016 11:56:47 )
  Using an effective genome fraction of 0.810858412293. (File: genomes, Log level: INFO, Time: Thu, 08 Sep 2016 11:56:47 )
  Binning Control2_treatment.bed.gz (File: run_epic, Log level: INFO, Time: Thu, 08 Sep 2016 11:56:47 )
  Binning chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, M, X, Y (File: count_reads_in_windows, Log level: INFO, Time: Thu, 08 Sep 2016 11:56:47 )
  Merging the bins on both strands per chromosome. (File: count_reads_in_windows, Log level: INFO, Time: Thu, 08 Sep 2016 12:00:12 )
  Binning Control2_control.bed.gz (File: run_epic, Log level: INFO, Time: Thu, 08 Sep 2016 12:00:13 )
  Binning chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, M, X, Y (File: count_reads_in_windows, Log level: INFO, Time: Thu, 08 Sep 2016 12:00:13 )
  Merging the bins on both strands per chromosome. (File: count_reads_in_windows, Log level: INFO, Time: Thu, 08 Sep 2016 12:03:40 )
  Merging ChIP and Input data. (File: helper_functions, Log level: INFO, Time: Thu, 08 Sep 2016 12:03:40 )
  2510169508.0 effective_genome_length (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 )
  200 window size (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 )
  0 total chip count (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 )
  0.0 average_window_readcount (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 )
  1 island_enriched_threshold (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 )
  4.0 gap_contribution (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 )
  1.0 boundary_contribution (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 )
  Traceback (most recent call last):
    File "/usr/local/anaconda2/bin/epic", line 4, in <module>
      __import__('pkg_resources').run_script('bioepic==0.1.6', 'epic')
    File "/usr/local/anaconda2/lib/python2.7/site-packages/pkg_resources.py", line 461, in run_script
      self.require(requires)[0].run_script(script_name, ns)
    File "/usr/local/anaconda2/lib/python2.7/site-packages/pkg_resources.py", line 1194, in run_script
      execfile(script_filename, namespace, namespace)
    File "/usr/local/anaconda2/lib/python2.7/site-packages/bioepic-0.1.6-py2.7.egg-info/scripts/epic", line 165, in <module>
      run_epic(args)
    File "/usr/local/anaconda2/lib/python2.7/site-packages/epic/run/run_epic.py", line 45, in run_epic
      compute_background_probabilities(nb_chip_reads, args)
    File "/usr/local/anaconda2/lib/python2.7/site-packages/epic/statistics/compute_background_probabilites.py", line 49, in compute_background_probabilities
      boundary_contribution, genome_length_in_bins)
    File "/usr/local/anaconda2/lib/python2.7/site-packages/epic/statistics/compute_score_threshold.py", line 24, in compute_score_threshold
      current_scaled_score = int(round(score / BIN_SIZE))
  OverflowError: cannot convert float infinity to integer

Any ideas?

endrebak commented 7 years ago

The last time this happened there was a problem with the users files.

Could you please post the output of zcat Control2_treatment.bed.gz | head | cat -et?

(My guess is that your files are not tab-separated.)

On Thursday, September 8, 2016, Carlos Guzman notifications@github.com wrote:

Command used:

Command executed:

epic --treatment Control2_treatment.bed.gz --control Control2_control.bed.gz --genome hg19 --fragment-size 150 > Control2_epic.bed

Resulting error:

Command error:

epic --treatment Control2_treatment.bed.gz --control Control2_control.bed.gz --genome hg19 --fragment-size 150 (File: epic, Log level: INFO, Time: Thu, 08 Sep 2016 11:56:47 )

gzip: stdout: Broken pipe Used first 10000 reads of Control2_treatment.bed.gz to estimate a median read length of 34.0 Mean readlength: 33.3074, max readlength: 37, min readlength: 20. (File: find_readlength, Log level: INFO, Time: Thu, 08 Sep 2016 11:56:47 ) Using an effective genome fraction of 0.810858412293. (File: genomes, Log level: INFO, Time: Thu, 08 Sep 2016 11:56:47 ) Binning Control2_treatment.bed.gz (File: run_epic, Log level: INFO, Time: Thu, 08 Sep 2016 11:56:47 ) Binning chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, M, X, Y (File: count_reads_in_windows, Log level: INFO, Time: Thu, 08 Sep 2016 11:56:47 ) Merging the bins on both strands per chromosome. (File: count_reads_in_windows, Log level: INFO, Time: Thu, 08 Sep 2016 12:00:12 ) Binning Control2_control.bed.gz (File: run_epic, Log level: INFO, Time: Thu, 08 Sep 2016 12:00:13 ) Binning chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, M, X, Y (File: count_reads_in_windows, Log level: INFO, Time: Thu, 08 Sep 2016 12:00:13 ) Merging the bins on both strands per chromosome. (File: count_reads_in_windows, Log level: INFO, Time: Thu, 08 Sep 2016 12:03:40 ) Merging ChIP and Input data. (File: helper_functions, Log level: INFO, Time: Thu, 08 Sep 2016 12:03:40 ) 2510169508.0 effective_genome_length (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 ) 200 window size (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 ) 0 total chip count (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 ) 0.0 average_window_readcount (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 ) 1 island_enriched_threshold (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 ) 4.0 gap_contribution (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 ) 1.0 boundary_contribution (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 ) Traceback (most recent call last): File "/usr/local/anaconda2/bin/epic", line 4, in import('pkg_resources').run_script('bioepic==0.1.6', 'epic') File "/usr/local/anaconda2/lib/python2.7/site-packages/pkg_resources.py", line 461, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/local/anaconda2/lib/python2.7/site-packages/pkg_resources.py", line 1194, in run_script execfile(script_filename, namespace, namespace) File "/usr/local/anaconda2/lib/python2.7/site-packages/bioepic-0.1.6-py2.7.egg-info/scripts/epic", line 165, in run_epic(args) File "/usr/local/anaconda2/lib/python2.7/site-packages/epic/run/run_epic.py", line 45, in run_epic compute_background_probabilities(nb_chip_reads, args) File "/usr/local/anaconda2/lib/python2.7/site-packages/epic/statistics/compute_background_probabilites.py", line 49, in compute_background_probabilities boundary_contribution, genome_length_in_bins) File "/usr/local/anaconda2/lib/python2.7/site-packages/epic/statistics/compute_score_threshold.py", line 24, in compute_score_threshold current_scaled_score = int(round(score / BIN_SIZE)) OverflowError: cannot convert float infinity to integer

Any ideas?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/endrebak/epic/issues/38, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9I0v4qV0KRQOWrbWeG0em76JWACekEks5qoE_dgaJpZM4J4SS- .

endrebak commented 7 years ago

And if this is not it, I will probably be able to find the bug right away if you could send me the files privately through dropbox or something.

c-guzman commented 7 years ago

Head of treatment bed:

10^I60560^I60594^IHWI-EAS306_8184_FC309UE_CR_3_72_357_713^I1^I+$
10^I61167^I61200^IHWI-EAS306_8184_FC309UE_CR_2_78_553_1598^I1^I-$
10^I62038^I62073^IHWI-EAS306_8184_FC309UE_CR_2_78_539_852^I1^I-$
10^I62082^I62116^IHWI-EAS306_8184_FC309UE_CR_3_72_1521_1020^I1^I-$
10^I62305^I62340^IHWI-EAS306_8184_FC309UE_CR_2_55_774_35^I1^I-$
10^I62578^I62611^IHWI-EAS306_8184_FC309UE_CR_2_26_1576_1213^I1^I-$
10^I62588^I62621^IHWI-EAS306_8184_FC309UE_CR_2_79_1251_742^I1^I-$
10^I62742^I62775^IHWI-EAS306_8184_FC309UE_CR_3_13_999_509^I1^I+$
10^I63075^I63105^IHWI-EAS306_8184_FC309UE_CR_3_44_859_1033^I1^I-$
10^I63765^I63799^IHWI-EAS306_8184_FC309UE_CR_3_24_1570_486^I1^I+$

Head of Input bed:

10^I60560^I60594^IHWI-EAS306_8184_FC309UE_CR_3_72_357_713^I1^I+$
10^I61167^I61200^IHWI-EAS306_8184_FC309UE_CR_2_78_553_1598^I1^I-$
10^I62038^I62073^IHWI-EAS306_8184_FC309UE_CR_2_78_539_852^I1^I-$
10^I62082^I62116^IHWI-EAS306_8184_FC309UE_CR_3_72_1521_1020^I1^I-$
10^I62305^I62340^IHWI-EAS306_8184_FC309UE_CR_2_55_774_35^I1^I-$
10^I62578^I62611^IHWI-EAS306_8184_FC309UE_CR_2_26_1576_1213^I1^I-$
10^I62588^I62621^IHWI-EAS306_8184_FC309UE_CR_2_79_1251_742^I1^I-$
10^I62742^I62775^IHWI-EAS306_8184_FC309UE_CR_3_13_999_509^I1^I+$
10^I63075^I63105^IHWI-EAS306_8184_FC309UE_CR_3_44_859_1033^I1^I-$
10^I63765^I63799^IHWI-EAS306_8184_FC309UE_CR_3_24_1570_486^I1^I+$

I can send the files to you via dropbox if you still need them. They're encode files. Specifically ENCSR000DLY and it's associated control.

I used bamtobed from bedtools to generate these bed files so i'm not entirely sure why they wouldn't be tab separated to begin with.

endrebak commented 7 years ago

I was able to reproduce this; it happens when there is no "chr" in the chromosome names. For now you can solve it by running

sed "s/^/chr/" treatment.bed > ucsc_treatment.bed sed "s/^/chr/" input.bed > ucsc_input.bed

Thanks for reporting. I will support encode style chromo names soon, but do not have time to fix it right now.

On Fri, Sep 9, 2016 at 4:55 PM, Carlos Guzman notifications@github.com wrote:

Head of treatment bed:

10^I60560^I60594^IHWI-EAS306_8184_FC309UE_CR_3_72_357_713^I1^I+$ 10^I61167^I61200^IHWI-EAS306_8184_FC309UE_CR_2_78_553_1598^I1^I-$ 10^I62038^I62073^IHWI-EAS306_8184_FC309UE_CR_2_78_539_852^I1^I-$ 10^I62082^I62116^IHWI-EAS306_8184_FC309UE_CR_3_72_1521_1020^I1^I-$ 10^I62305^I62340^IHWI-EAS306_8184_FC309UE_CR_2_55_774_35^I1^I-$ 10^I62578^I62611^IHWI-EAS306_8184_FC309UE_CR_2_26_1576_1213^I1^I-$ 10^I62588^I62621^IHWI-EAS306_8184_FC309UE_CR_2_79_1251_742^I1^I-$ 10^I62742^I62775^IHWI-EAS306_8184_FC309UE_CR_3_13_999_509^I1^I+$ 10^I63075^I63105^IHWI-EAS306_8184_FC309UE_CR_3_44_859_1033^I1^I-$ 10^I63765^I63799^IHWI-EAS306_8184_FC309UE_CR_3_24_1570_486^I1^I+$

Head of Input bed:

10^I60560^I60594^IHWI-EAS306_8184_FC309UE_CR_3_72_357_713^I1^I+$ 10^I61167^I61200^IHWI-EAS306_8184_FC309UE_CR_2_78_553_1598^I1^I-$ 10^I62038^I62073^IHWI-EAS306_8184_FC309UE_CR_2_78_539_852^I1^I-$ 10^I62082^I62116^IHWI-EAS306_8184_FC309UE_CR_3_72_1521_1020^I1^I-$ 10^I62305^I62340^IHWI-EAS306_8184_FC309UE_CR_2_55_774_35^I1^I-$ 10^I62578^I62611^IHWI-EAS306_8184_FC309UE_CR_2_26_1576_1213^I1^I-$ 10^I62588^I62621^IHWI-EAS306_8184_FC309UE_CR_2_79_1251_742^I1^I-$ 10^I62742^I62775^IHWI-EAS306_8184_FC309UE_CR_3_13_999_509^I1^I+$ 10^I63075^I63105^IHWI-EAS306_8184_FC309UE_CR_3_44_859_1033^I1^I-$ 10^I63765^I63799^IHWI-EAS306_8184_FC309UE_CR_3_24_1570_486^I1^I+$

I can send the files to you via dropbox if you still need them. They're encode files. Specifically ENCSR000DLY and it's associated control.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/endrebak/epic/issues/38#issuecomment-245937519, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9I0vvVW6pY5nzQ-Sb69-z5GZJ0RLFUks5qoXNWgaJpZM4J4SS- .

c-guzman commented 7 years ago

Great! Thank you!

balwierz commented 7 years ago

All of zebrafish genome contigs are not called ~ "^chr". IMHO chromosome names should be assumed to be any arbitrary strings without white characters.

endrebak commented 7 years ago

Thanks for the feedback!

I agree that epic should handle gencode genome names, but I do not see how to handle arbitrary strings.

The problem is that I have a lookup table for each supported genome:

# example dm3
chr2L   23011544
chr2LHet    368872
chr2R   21146708
chr2RHet    3288761
chr3L   24543557
chr3LHet    2555491
chr3R   27905053
chr3RHet    2517507
chr4    1351857
chrU    10049037
chrUextra   29004656
chrX    22422827
chrXHet 204112
chrYHet 347038
chrM    19517

I need to match the chromosome names in the user's bed files to the chromosome names in the lookup table. If the names are arbitrary strings I do not see how to do it.

Also, contigs (are those non-canonical chromosome parts?) are not supported because they would mess with mappability and total genome length. Of course if the contig length is <<< genome length, this should not matter much.

Edit: I think the only problem with arbitrary names for chromosomes is that I cannot remove reads that are shifted outside of the chromosome boundaries, which is a minor issue. I'll think about whether I should add a --non-canonical-chromosomes flag (suggestions for better names accepted).

Edit2: Also, the built-in chromosome names are there to save time; without them, I would need another pass through the data to fetch the chromosome names. This is because to make each core work with one chromosome, I need to know which chromosomes are in the data. Of course cut -f 1 | sort -u would fetch all the chromo names for me and take very little time...

Edit3: With non-canonical genome names, the bigwigs created by epic won't be viewable in the UCSC genome browser.

endrebak commented 7 years ago

@balwierz In 0.1.21 I have added the flag --chromsizes/-cs so that arbitrary genomes, assemblies or simulated data can be used.