biocore-ntnu / epic

(DEPRECATED) epic: diffuse domain ChIP-Seq caller based on SICER
http://bioepic.readthedocs.io
MIT License
31 stars 6 forks source link

bigwig parameter with epic #60

Closed yeroslaviz closed 6 years ago

yeroslaviz commented 6 years ago

I don't really understand what it means here. In the help it says:

--bigwig BIGWIG, -bw BIGWIG
                        For each file, store a bigwig of both enriched and
                        non-enriched files to folder <BIGWIG>. Requires
                        different basenames for each file.

Are these two separate files for enriched and non-enriched regions. Or this suppose to say

                       ...
                       For each file, store a bigwig of both enriched and
                        non-enriched **regions** to folder <BIGWIG>. ...
endrebak commented 6 years ago

For each bed-file f1.bed f2.bed input1.bed input2.bed a bigwig is created: f1.bw, f2.bw, input1.bw input2.bw

:)

yeroslaviz commented 6 years ago

Yes this I get, but what about the enriched and non-enriched part of the description? Are they separated or in the same file?

Are the bw files somehow normalized (RPKM?) The point is - what happens if i run it in a script and am using the same file (e.g. ip.1.bedpe) as a comparison to different input files (e.g. input.4h.bedpe, input2h.bed.pe)? If there is a normalisation against lib size, are the files being overwritten? It was in earlier version better, as the bigwig files were written into separate folders, which one can name, even if the input files were the same, you still get separated results.

endrebak commented 6 years ago

The bigwigs option still places files into a folder.

Yes, I meant both enriched and non-enriched regions.

Yes the files are RPKM-normalized, but there are some bugs in it I tried to fix in 0.2.3.

The --bigwigs are not normalized against input. The new --individual-log2fc-bigwigs normalizes against input.

I guess I should add mathjax to the docs and write equations :)

yeroslaviz commented 6 years ago

Sorry for being a nuisance. But there is something I don't quite understand here. Maybe it is such a basic question, that it is too easy to see. Am I correct in this assumption(s)?

Am I correct so far? Is there a way to get bigwig files for only the enriched regions?

endrebak commented 6 years ago

Yes, the bed and csv files are only for the enriched regions. So if you add the bed to the UCSC genome browser you can easily tell which regions are enriched.

All bigwigs contain every single read, none contain just the enriched regions. There is no way to only get bigwigs of enriched regions. This link seems to contain a recipe for subsetting: https://www.biostars.org/p/243792/

Endre

On Wednesday, July 26, 2017, frymor notifications@github.com wrote:

Sorry for being a nuisance. But there is something I don't quite understand here. Maybe it is such a basic question, that it is too easy to see. Am I correct in this assumption(s)?

  • The --bw files are not normalised, contain though all identified peaks
  • The -i2bw OTOH are RPKM-normalised, but contain also all identified peaks.
  • The -cbw are a summed-bw files, containing also all identified peaks in all iped samples.
  • The -b and -o files contains only the enriched regions (significant peaks) based on the set FDR value.

Am I corect so far? Is there a way to get bigwig files for only the enriched regions?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/60#issuecomment-318030292, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9I0k-4fEdnNBuHzWiso4B7gqeWmcyuks5sRyfDgaJpZM4Oi81F .

endrebak commented 6 years ago

Mathematical description of what scores the bigwigs contain:

http://bioepic.readthedocs.io/en/latest/output_files.html

Thanks for pushing me to do this, you would not have been the last to ask :)