Open olegs opened 6 years ago
sicer.sh
should now delete the corresponding tmp folder as soon as it's done with a BAM file, not at the end of the whole batch. Will try and see whether it works.
According to documentation: http://gensoft.pasteur.fr/docs/SICER/1.1/SICER-README.pdf
3.1: Running SICER with a control library: SICER.sh
test-W200-G600.scoreisland
: an intermediate file for debugging usage.test-W200-G600-islands-summary
: summary of all candidate islands with their
statistical significance. It has the format:
chrom, start, end, ChIP_island_read_count, CONTROL_island_read_count, p_value, fold_change, FDR_thresholdtest-W200-G600-islands-summary-FDR.01
: summary file of significant islands with requirement of FDR=0.01.test-W200-G600-FDR.01-island.bed
: delineation of significant islands in “chrom start end read-count-from-redundancy-removed-test.bed” formatOf all these files, the
test-W200-G600-islands-summary-FDR.01
andtest-W200-G600- FDR.01-island.bed
are most important for further analysis. The first one contains the details about each significant island. The second one contains the redundancy-removed raw reads filtered by islands. In addition, the two wig files shall be used for visual examination of the raw and processed data on the genome browser.
Here seems there is a typo in dox and they meant OD_OD11_H3K4me3-W200-G0-FDR0.01-islandfiltered.bed
, not test-W200-G600- FDR.01-island.bed
.
3.2: Running SICER without a control library: SICER-rb.sh
test-W200-G400-E100.scoreisland
: delineation of significant islands controlled by E-
value of 100, in “chrom start end score” formattest-W200-G400-E100-islandfiltered.bed
: library of raw redundancy-removed reads that
are on significant islands.Of all these files, the
test-W200-G400-E100.scoreisland
andtest-W200-G400-E100- islandfiltered.bed
are most important for further analysis. The first one contains the delineation of each significant island. The second one contains the redundancy-removed raw reads filtered by significant islands. In addition, the two wig files shall be used for visual examination of the raw and processed data on genome browser.
@olegs I assume that we use wrong files...
Seems we need
SICER + control:
test-W200-G600-islands-summary-FDR.01
SICER w/o control:
test-W200-G400-E100.scoreisland
Also please don't forget to fix tests: https://teamcity.jetbrains.com/viewLog.html?buildId=1773115&tab=buildResultsDiv&buildTypeId=Epigenome_Tools_WashuPipelineTests
SICER + control: In case of SICER+ control
test-W200-G600- FDR.01-island.bed
contain only read count in target, we needtest-W200-G600-islands-summary-FDR.01
, see examples:
(py35) user@franklin:/mnt/stripe/rcherniatchik/workshop/prepare_test_data/aging/chip-seq/sicer/bams_sicer_redo_g0$ head -n 5 OD_OD11_H3K4me3-W200-G0-FDR0.01-island.bed
chr1 714200 714799 20
chr1 724800 725399 20
chr1 762000 763199 50
chr1 825000 826199 36
chr1 853200 854799 45
(py35) user@franklin:/mnt/stripe/rcherniatchik/workshop/prepare_test_data/aging/chip-seq/sicer/bams_sicer_redo_g0$ wc -l OD_OD11_H3K4me3-W200-G0-FDR0.01-island.bed
35250 OD_OD11_H3K4me3-W200-G0-FDR0.01-island.bed
(py35) user@franklin:/mnt/stripe/rcherniatchik/workshop/prepare_test_data/aging/chip-seq/sicer/bams_sicer_redo_g0$ head -n 5 OD_OD11_H3K4me3-W200-G0-islands-summary-FDR0.01
chr1 714200 714799 20 2 1.0167254121e-11 6.72124301394 2.29643125299e-11
chr1 724800 725399 20 2 1.0167254121e-11 6.72124301394 2.29643125299e-11
chr1 762000 763199 50 5 1.23868131951e-25 6.72124301394 4.70043453091e-25
chr1 825000 826199 36 15 0.00272142363158 1.61309832335 0.00327911031922
chr1 853200 854799 45 16 3.51625828979e-05 1.89034959767 5.00981955398e-05
(py35) user@franklin:/mnt/stripe/rcherniatchik/workshop/prepare_test_data/aging/chip-seq/sicer/bams_sicer_redo_g0$ wc -l OD_OD11_H3K4me3-W200-G0-islands-summary-FDR0.01
35250 OD_OD11_H3K4me3-W200-G0-islands-summary-FDR0.01
Example of output files lines count, here we need 29716
peaks
196 ./CD14_GSM1102782_UW_r0_H3K27ac_F150_W200_G600_FDR0.01_sicer.log
19509576 ./CD14_GSM1102782_UW_r0_H3K27ac_input-1-removed.bed
20108301 ./CD14_GSM1102782_UW_r0_H3K27ac-1-removed.bed
29716 ./CD14_GSM1102782_UW_r0_H3K27ac-W200-G600-FDR0.01-island.bed
10537827 ./CD14_GSM1102782_UW_r0_H3K27ac-W200-G600-FDR0.01-islandfiltered.bed
601522 ./CD14_GSM1102782_UW_r0_H3K27ac-W200-G600-FDR0.01-islandfiltered-normalized.wig
31063 ./CD14_GSM1102782_UW_r0_H3K27ac-W200-G600-islands-summary
29716 ./CD14_GSM1102782_UW_r0_H3K27ac-W200-G600-islands-summary-FDR0.01
31063 ./CD14_GSM1102782_UW_r0_H3K27ac-W200-G600.scoreisland
6110881 ./CD14_GSM1102782_UW_r0_H3K27ac-W200.graph
6110907 ./CD14_GSM1102782_UW_r0_H3K27ac-W200-normalized.wig
63100768 total
SICER w/o control:
test-W200-G400-E100.scoreisland
E.g.
user@rosalind:/mnt/stripe/rcherniatchik/workshop/GSE112622_demo/chipseq/fastq_bams_sicer$ head ./GSM3074494_C57BL_6_oocyte_H3K36me3_rep1_mm10-W200-G0-E0.01.scoreisland
chrX 5899600 5900599 40.7270184893
chrX 6139400 6140399 36.4504222076
chrX 6205200 6206399 43.984789155
chrX 6233200 6235399 115.734660391
chrX 6237600 6238799 70.3979365603
Example of output, we expect 53356
peaks
4281531 ./GSM3074494_C57BL_6_oocyte_H3K36me3_rep1_mm10-W200-G0-E0.01-islandfiltered.bed
6218077 ./GSM3074494_C57BL_6_oocyte_H3K36me3_rep1_mm10-W200-normalized.wig
163118 ./GSM3074494_C57BL_6_oocyte_H3K36me3_rep1_mm10_F150_W200_G0_FDR0.01_sicer.log
414689 ./GSM3074494_C57BL_6_oocyte_H3K36me3_rep1_mm10-W200-G0-E0.01-islandfiltered-normalized.wig
53356 ./GSM3074494_C57BL_6_oocyte_H3K36me3_rep1_mm10-W200-G0-E0.01.scoreisland
11130771 total
At the moment files are not cleaned up till all the files are processed, which leads to huge size of resulting folders. Extra files: