biocore-ntnu / epic2

Ultraperformant reimplementation of SICER
https://doi.org/10.1093/bioinformatics/btz232
MIT License
56 stars 9 forks source link

couldn't detect certain peaks #12

Closed smartgamer closed 5 years ago

smartgamer commented 5 years ago

which can be detected by SICER. It's a huge peak.

There must be some difference between SICER and epic2. What are those differences?

endrebak commented 5 years ago

Effective genome fraction is different by default. Try 0.85 for humans.

What is the command you use to call epic?

See the faq in the readme for more differences.

endrebak commented 5 years ago

I’ll create a debug-script so that users themselves can verify whether st should be a peak or not according to the SICER algorithm. I’ve found one bug in SICER for peaks that are the last on the chromosome. Of course, epic2 might contain bugs too.

If its a bug in epic2 I’ll make sure to fix it :)

Have you tried upgrading epic2? What version of SICER are you comparing against?

Thanks for reporting btw :)

smartgamer commented 5 years ago

It's a peak because I can see it in the genome browser, not only because of SICER. It's located at around chr22:3838--.
epic2 I'm using is the newest one and SICER too.

endrebak commented 5 years ago

You cannot detect whether something should be a peak according to the SICER algorithm with visual inspection.

If you set the FDR to 1 in epic2, do you still not see that peak? If you see it, what is the FDR score?

So you are using SICER, not SICERpy?

Can you post the command used to invoke SICER and epic2 please?

smartgamer commented 5 years ago

The commands I used: epic2: epic2 -t rchr22.bed -c schr22.bed -bin 100 -g 2 > epicResults_chr22.txt sh SICER-df-rb.sh r.bed s.bed 200 600 100 0.01 And yes, I used SICER instead of SICERpy.

I'll set the FDR to 1 in epic2 and check the results

endrebak commented 5 years ago

Thanks for reporting. Are these confidential files or can you share them with me? If so I can have a look too :)

endrebak commented 5 years ago

I have uploaded a version 0.0.19 to PyPI. I have added the following flag:

  --original-algorithm, -oa
                        Use the original SICER algorithm, without the epic2
                        fix. This will use all reads in your files to compute
                        the p-values, including those falling outside the
                        genome boundaries.

I consider the above a logical bug in SICER, but if you get the same results with epic2 using the -oa flag, you know why they produce different results at least :) If they are still different, I will look further into it.

endrebak commented 5 years ago

Ah, you are using the differential version of SICER. I have not included those as I see using linear models as much preferable.

Did you mean to call regions or find differences between two conditions? This should go in the FAQ :)

endrebak commented 5 years ago

I implemented a differential version. Try pip install epic2==0.0.20.

epic2-df -ex
Knockout: /mnt/work/endrebak/software/anaconda/lib/python3.6/site-packages/epic2-0.0.20-py3.6-linux-x86_64.egg/epic2/examples/test.bed.gz
Wildtype: /mnt/work/endrebak/software/anaconda/lib/python3.6/site-packages/epic2-0.0.20-py3.6-linux-x86_64.egg/epic2/examples/control.bed.gz
Example command: epic2-df -tk /mnt/work/endrebak/software/anaconda/lib/python3.6/site-packages/epic2-0.0.20-py3.6-linux-x86_64.egg/epic2/examples/test.bed.gz -tw /mnt/work/endrebak/software/anaconda/lib/python3.6/site-packages/epic2-0.0.20-py3.6-linux-x86_64.egg/epic2/examples/control.bed.gz -ok deleteme_ko.txt -ow deleteme_wt.txt > deleteme.txt

I have not compared it against the original, but I did read the original source to implement it. Please tell me how it works :)

endrebak commented 5 years ago

I uploaded a new version of epic2 to PyPI. I have fixed a bug in epic2-df and checked that it produces the same result when used with the canonical SICER test data :)

Victor21v commented 5 years ago

Hi,

I have installed version 0.0.25 to test epic2-df. However, it displays a module error:

Traceback (most recent call last): File "/home/usuario/anaconda3/bin/epic2-df", line 17, in from epic2.src.differential import count_reads_on_islands ModuleNotFoundError: No module named 'epic2.src.differential'

endrebak commented 5 years ago

Yeah, I’ll fix that next week :)

Thanks for the notice!

On Thursday, March 14, 2019, Victor21v notifications@github.com wrote:

Hi,

I have installed version 0.0.25 to test epic2-df. However, it displays a module error:

Traceback (most recent call last): File "/home/usuario/anaconda3/bin/epic2-df", line 17, in from epic2.src.differential import count_reads_on_islands ModuleNotFoundError: No module named 'epic2.src.differential'

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic2/issues/12#issuecomment-472965689, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9I0pP58MUEXurxNYYV9IDXY4jTlTgjks5vWoEQgaJpZM4bOAsV .

Victor21v commented 5 years ago

Ok, Thanks!

endrebak commented 5 years ago

Actually I'll finish my pyranges paper, then do this when it is in for review. When that will be I do not know, hopefully soon :)

endrebak commented 5 years ago

pip install epic2==0.0.26.

I've tested it on two datasets, it gives the same results as SICER.

epic2 will be out in bioinformatics in a month, please cite if you are using it for a publication :).