biocore-ntnu / epic

(DEPRECATED) epic: diffuse domain ChIP-Seq caller based on SICER
http://bioepic.readthedocs.io
MIT License
31 stars 6 forks source link

problem with output #82

Closed lucacozzuto closed 5 years ago

lucacozzuto commented 6 years ago

Dear developers, I'm using epic on a custom genome and I got a strange way to report the chromosomes:

Chromosome Start End ChIP Input Score Log2FC P FDR 1.0 3444800 3448199 76 31 46.45753848688762 1.3234133686618605 1.0186276074296772e-12 1.6867722789083852e-12 1.0 4773200 4777399 251 59 215.2449786380842 2.1185826701940806 5.0498235300167793e-79 2.9157875286298807e-78 1.0 4786400 4787599 37 8 29.947537650660315 2.2391355312341 8.964265400850882e-15 1.6083926773604145e-14

The genome sizes used are instead 1, 2, 3 etc...

Luca

endrebak commented 5 years ago

Do you mean chromosome names? Should be 1, 1, 1 instead of 1.0, 1.0, 1.0 in the first column?

Anyways, thanks for reporting. If you can elaborate I'll try to get to the bottom of your problem.

lucacozzuto commented 5 years ago

yes. I don't understand why I get 1.0 instead of just 1...

endrebak commented 5 years ago

The problem is likely that there are some rows missing values for the chromosomes. nan does not exist for integers so the column is promoted to floats. Are there any rows with a missing chromosome name?

I think cut -f 1 result_file.txt | sort | uniq -c should display all chromosomes in the file and how many there are of each. If there are nans, this will be displayed as an empty space.

lucacozzuto commented 5 years ago

this is the result 3113 1.0 2768 10.0 3494 11.0 2094 12.0 2530 13.0 1932 14.0 2231 15.0 1799 16.0 2201 17.0 1491 18.0 1639 19.0 3641 2.0 2431 3.0 2878 4.0 2897 5.0 2600 6.0 2986 7.0 2933 8.0 2592 9.0 892 X 4 Y

lucacozzuto commented 5 years ago

this instead the genome size file 1 195471971 2 182113224 3 160039680 4 156508116 5 151834684 6 149736546 7 145441459 8 129401213 9 124595110 10 130694993 11 122082543 12 120129022 13 120421639 14 124902244 15 104043685 16 98207768 17 94987271 18 90702639 19 61431566 X 171031299 Y 91744698 MT 16299

endrebak commented 5 years ago

Thank you. I'll fix this tomorrow. :)

On Mon, Jul 30, 2018 at 1:29 PM Luca Cozzuto notifications@github.com wrote:

this instead the genome size file 1 195471971 2 182113224 3 160039680 4 156508116 5 151834684 6 149736546 7 145441459 8 129401213 9 124595110 10 130694993 11 122082543 12 120129022 13 120421639 14 124902244 15 104043685 16 98207768 17 94987271 18 90702639 19 61431566 X 171031299 Y 91744698 MT 16299

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/82#issuecomment-408832315, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9I0igba7bnAYlg-kZwSurPCtRtzh-1ks5uLu3_gaJpZM4VUbrI .

lucacozzuto commented 5 years ago

thanks a lot!

endrebak commented 5 years ago

I just uploaded epic 0.2.10 to PyPI. The new version is more mem-efficient, but requires pandas>=0.23.0.

There was another bug in the script so your results should also hopefully be different.

Try pip install bioepic==0.2.10.

lucacozzuto commented 5 years ago

Thank you!

endrebak commented 5 years ago

If something does not work, feel free to reopen. Thank you for bothering to report :)