CollasLab / edd

Enriched Domain Detector for ChIP-seq data
https://pypi.python.org/pypi/edd
MIT License
16 stars 4 forks source link

AssertionError #6

Closed endrebak closed 7 years ago

endrebak commented 7 years ago

Hi --

I am doing a statistical simulation to see which chip-seq domain detectors are good for differential chip-seq. EDD is one of the software packages I want to test (assuming getting it to work does not take too long).

I am trying to run EDD with the following parameters:

~/anaconda3/envs/py27/bin/edd  --bin-size 1 --fdr 0.05 -g 5 -p 5 hg19.chromsizes unalignable test.bam control.bam results

The error I get is the following:

Traceback (most recent call last):
  File "/local/home/endrebak/anaconda3/envs/py27/bin/edd", line 146, in <module>
    main(args, config)
  File "/local/home/endrebak/anaconda3/envs/py27/bin/edd", line 60, in main
    loader.load_single_experiment(args.ip_bam, args.input_bam)
  File "/local/home/endrebak/anaconda3/envs/py27/lib/python2.7/site-packages/eddlib/experiment.py", line 165, in load_single_experiment
    self.exp = self.load_bam(ip_name, ctrl_name)
  File "/local/home/endrebak/anaconda3/envs/py27/lib/python2.7/site-packages/eddlib/experiment.py", line 135, in load_bam
    use_multiprocessing=True)
  File "/local/home/endrebak/anaconda3/envs/py27/lib/python2.7/site-packages/eddlib/experiment.py", line 48, in load_experiment
    ipd, inputd = fmap(f, [ip_bam_path, input_bam_path])
  File "/local/home/endrebak/anaconda3/envs/py27/lib/python2.7/site-packages/eddlib/experiment.py", line 44, in <lambda>
    fmap = lambda g, xs: pool.map_async(g, xs).get(99999999)
  File "/local/home/endrebak/anaconda3/envs/py27/lib/python2.7/multiprocessing/pool.py", line 567, in get
    raise self._value
AssertionError

unalignable is an empty file, hg19.chromsizes is a tab-delimited file with chroms/chromsizes and the two bam files are small test files (the error also happens if I try to use real data).

Do you have any idea what might be wrong?

Endre

eivindgl commented 7 years ago

First of all, sorry about the error message. That does not help much. Your error happens during preprocessing when EDD tries to read bam files. This works on my machine, so I need a little more information from you. First, could you check that the chromosome names in the hg19.chromsizes and the bam files correspond? If it does, it would be great if you could share your testfiles with me. In any case, I will make sure to update the error message once we figure out what is going on. All the best, Eivind

endrebak commented 7 years ago

Thanks for the prompt reply.

Seems like the chromosome names are the same:

cat hg19.chromsizes
chr1    249250621
...
chrM    16571
endrebak@havpryd ~/c/test_edd> samtools view test.bam | head
U0  16  chr8    28510033    255 25M *   0   0   *   *
...
U0  16  chr10   35419785    255 25M *   0   0   *   *

My unalignable file is empty (tried both with and without a newline).

This is my ChIP-file: https://github.com/biocore-ntnu/epic/raw/master/examples/test.bam This is my Input-file: https://github.com/biocore-ntnu/epic/raw/master/examples/control.bam This is my chromsizes file: https://raw.githubusercontent.com/biocore-ntnu/epic/master/epic/scripts/chromsizes/hg19.chromsizes

I hope you'll be able to reproduce my error by running:

wget https://github.com/biocore-ntnu/epic/raw/master/examples/test.bam
wget https://github.com/biocore-ntnu/epic/raw/master/examples/control.bam
wget https://raw.githubusercontent.com/biocore-ntnu/epic/master/epic/scripts/chromsizes/hg19.chromsizes
touch unalignable
edd  --bin-size 1 --fdr 0.05 -g 5 -p 5 hg19.chromsizes unalignable test.bam control.bam results

I am using the stable version btw, not the devel one. I should try that too.

Edit: got the same error with the development version.

eivindgl commented 7 years ago

You have 4 reads in chr3 and 17 in chr19 that falls outside of the chromosome boundaries defined in the chromsizes file. This is checked, but no informative error message is produced. I will make sure to update this.

You have to either adjust the chromsizes file or filter out reads that fall completely outside the chromosome boundaries.

endrebak commented 7 years ago

Thanks for the swift reply!

This is the kind of thing that some ChIP-Seq domain finders warn you about, because it might mean you aren't using the right chromsizes file.

Sometimes, however, you have preprocessed/shifted the reads a bit before the analysis so I think it would be optimal to eventually allow outside reads, but just warn about them and then ignore them. If this is worth the effort is entirely up to you of course.

eivindgl commented 7 years ago

I agree. Warning about this and dropping out of bounds reads is probably the best solution. Best of luck with your work!

endrebak commented 7 years ago

It seems like edd did not work for the kind of simulation we were doing (domains too narrow), but now I have one more domain finder in my arsenal at least. Will surely be useful in the future. Thanks for the help!