biocore-ntnu / epic2

Ultraperformant reimplementation of SICER
https://doi.org/10.1093/bioinformatics/btz232
MIT License
56 stars 9 forks source link

missing large domains #35

Open runningvato opened 4 years ago

runningvato commented 4 years ago

Hello,

I am analyzing cut and run H3K9me2 data and am having trouble getting epic2 to call several large domains that are clearly enriched. This histone mark is found enriched in very large domains. Sequencing was done using paired-end chemistry, so I convert the PE-bam files to bedpe which I then use for analysis Epic2. The treatment file is from the H3K9me2 cut and run library and the input file is an IgG control done in parallel. Weirdly, Epic2 is able to call these large H3K9me2 domains (I have found that the optimal setting is -bin 8000 -g 6), but it will miss random large domains that are clearly enriched (much higher H3K9me2 than IgG). For example, the middle of chromosome 4 is not found to be enriched.

Have you observed this before and if so, what do you recommend? I can provide bed files/bigwigs if you need.

Thanks.

endrebak commented 4 years ago

Could you upload the files and mention a few domains and I’ll have a look?

I could also create a script that converts bedpe to regular bed and you could see if the problem persists. Then the problem is with the way I interpret PE.

endrebak commented 4 years ago

Could you also try setting the fdr to 1 to see if those regions show up then? Also please show the lines with those regions from the output

On Sunday, January 19, 2020, runningvato notifications@github.com wrote:

Hello,

I am analyzing cut and run H3K9me2 data and am having trouble getting epic2 to call several large domains that are clearly enriched. This histone mark is found enriched in very large domains. Sequencing was done using paired-end chemistry, so I convert the PE-bam files to bedpe which I then use for analysis Epic2. The treatment file is from the H3K9me2 cut and run library and the input file is an IgG control done in parallel. Weirdly, Epic2 is able to call these large H3K9me2 domains (I have found that the optimal setting is -bin 8000 -g 6), but it will miss random large domains that are clearly enriched (much higher H3K9me2 than IgG). For example, the middle of chromosome 4 is not found to be enriched.

Have you observed this before and if so, what do you recommend? I can provide bed files/bigwigs if you need.

Thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic2/issues/35?email_source=notifications&email_token=AEHURUXHTUSQWEBI65ROK63Q6O3RFA5CNFSM4KIWRJ5KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHEUSQA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEHURURIP2GTSB2FBDATUHTQ6O3RFANCNFSM4KIWRJ5A .

runningvato commented 4 years ago

I can't add those file types directly to this thread, so instead, I uploaded genome files to my repository (you should get a link soon). They are in hg38 coordinates. The middle of chromosome 4 and almost the entire chromosome 5 are good examples of where this happens. In comparison, chr6:60,681,729-79,775,396 or chr10:8,672,941-38,586,220 have a similar amount of enrichment as the aforementioned regions, but contain statistically significant domains called by Epic2.

chr18:37,464,461-46,449,045 is a region where there are larger enriched domains, but only a tiny subregion is called as being enriched.

I have also tried setting the fdr to 1, but the domains were still missing. I also tried manually converting the .bedpe input files to .bed files by: awk 'BEGIN {OFS="\t"}; {print $1,$2,$5,$7,$8,$9}' file.bedpe > file.bed

However, the results were unchanged.

endrebak commented 4 years ago

Thanks. Do you know if SICER calls them correctly?

On Mon, Jan 20, 2020 at 7:09 AM runningvato notifications@github.com wrote:

I can't add those file types directly to this thread, so instead, I uploaded genome files to my repository (you should get a link soon). They are in hg38 coordinates. The middle of chromosome 4 and almost the entire chromosome 5 are good examples of where this happens. In comparison, chr6:60,681,729-79,775,396 or chr10:8,672,941-38,586,220 have a similar amount of enrichment as the aforementioned regions, but contain statistically significant domains called by Epic2.

chr18:37,464,461-46,449,045 is a region where there are larger enriched domains, but only a tiny subregion is called as being enriched.

I have also tried setting the fdr to 1, but the domains were still missing. I also tried manually converting the .bedpe input files to .bed files by: awk 'BEGIN {OFS="\t"}; {print $1,$2,$5,$7,$8,$9}' file.bedpe > file.bed

However, the results were unchanged.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic2/issues/35?email_source=notifications&email_token=AEHURUU7SCBA3PRHCTH5P3LQ6U5YTA5CNFSM4KIWRJ5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJLOVOI#issuecomment-576121529, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEHURUVBTH263343EHA6IKDQ6U5YTANCNFSM4KIWRJ5A .

endrebak commented 4 years ago

Do you have the original alignment files also?

On Mon, Jan 20, 2020 at 9:47 AM Endre Bakken Stovner endrebak85@gmail.com wrote:

Thanks. Do you know if SICER calls them correctly?

On Mon, Jan 20, 2020 at 7:09 AM runningvato notifications@github.com wrote:

I can't add those file types directly to this thread, so instead, I uploaded genome files to my repository (you should get a link soon). They are in hg38 coordinates. The middle of chromosome 4 and almost the entire chromosome 5 are good examples of where this happens. In comparison, chr6:60,681,729-79,775,396 or chr10:8,672,941-38,586,220 have a similar amount of enrichment as the aforementioned regions, but contain statistically significant domains called by Epic2.

chr18:37,464,461-46,449,045 is a region where there are larger enriched domains, but only a tiny subregion is called as being enriched.

I have also tried setting the fdr to 1, but the domains were still missing. I also tried manually converting the .bedpe input files to .bed files by: awk 'BEGIN {OFS="\t"}; {print $1,$2,$5,$7,$8,$9}' file.bedpe > file.bed

However, the results were unchanged.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic2/issues/35?email_source=notifications&email_token=AEHURUU7SCBA3PRHCTH5P3LQ6U5YTA5CNFSM4KIWRJ5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJLOVOI#issuecomment-576121529, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEHURUVBTH263343EHA6IKDQ6U5YTANCNFSM4KIWRJ5A .

runningvato commented 4 years ago

SICER does not call anything in those regions for either the default fdr setting of 0.01 or even 1.0 (these bed files are uploaded to the same repository).

Original bam files are ~5gb each- how should I transfer them to you?

endrebak commented 4 years ago

But then the epic2 does what it is supposed to do in a sense. Have you tried setting the e-value parameter to something else? I have never used it myself, but it is there to make SICER more lenient. Perhaps the original readme says something about it?

Would love to hear whether setting the e-value to something else helps.

Thanks!

On Monday, January 20, 2020, runningvato notifications@github.com wrote:

SICER does not call anything in those regions for either the default fdr setting of 0.01 or even 1.0 (these bed files are uploaded to the same repository).

Original bam files are ~5gb each- how should I transfer them to you?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic2/issues/35?email_source=notifications&email_token=AEHURUXCAZUZJRS2RURADRTQ6XHX3A5CNFSM4KIWRJ5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJNG7IQ#issuecomment-576352162, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEHURUXTIXYLXMUMOVQR6XTQ6XHX3ANCNFSM4KIWRJ5A .

runningvato commented 4 years ago

Ah- you are right. Thanks very much for your help!