WGLab / w4CSeq

web server of 4C-Seq data analysis pipeline
Other
3 stars 7 forks source link

Replicates and annotation files #4

Closed aliu90 closed 5 years ago

aliu90 commented 7 years ago

Hi,

I was wondering if there is any way for this software to analyze replicates or compare significant interactions across different samples? And another question, in your paper, there is mentioned an optional module where users can upload a BED formatted annotation file, such as ChIP-seq peaks. Is this available for command line enzyme based digestion?

Thank you.

nuoyazhou2 commented 7 years ago

Hello,

At this point, the tool can only analyze individual dataset. If you want to compare between replicates or across samples, it is recommended to take advantage of intermediate files containing the window-based counts and use differential analysis tool such as DESeq2 to analyze the data.

Yes, the BED formatted annotation files can be specified in the command line file, by specifying the path to the control file (if any) to chipc, and paths to files of the experiment to chip1, chip2, chip3, etc.

On Thu, Oct 12, 2017 at 10:39 AM, aliu90 notifications@github.com wrote:

Hi,

I was wondering if there is any way for this software to analyze replicates or compare significant interactions across different samples? And another question, in your paper, there is mentioned an optional module where users can upload a BED formatted annotation file, such as ChIP-seq peaks. Is this available for command line enzyme based digestion?

Thank you.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_WGLab_w4CSeq_issues_4&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=dx6C4H6zWAbh1M8SMGSUDzcAVaKiNxdHQhJMBT3ogyY&s=zFSH-qwHLrAYinRJlRl-xAG2gBfJf_J81GrtLK2mpYs&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AOhFiyGiTFP0rNCjAsiQgdWCnzift39-5Fks5srk7ggaJpZM4P3XrK&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=dx6C4H6zWAbh1M8SMGSUDzcAVaKiNxdHQhJMBT3ogyY&s=8coQ2GOBbIWJL7xKzqpQPLxvYh17vN1Y1MTADvXVhBc&e= .

aliu90 commented 6 years ago

Hi I'm still having trouble getting the Chip-Seq enrichment analysis to work.

I downloaded a ChIP-seq dataset from UCSC. It's the bed file for peaks (I'm not sure if this is the right file type or maybe I have to modify this file in some way).

I have no control dataset, so the only modification to the R script I made is here:

if (control == "no") { chipfile <- c("H3K4me3.bed")

And I have this bed file in my working directory.

This is the error message I'm getting: Error in if (chipdata == "yes" && file.info("chip_name.txt")$size > 0) { : missing value where TRUE/FALSE needed Execution halted

I'd really appreciate any advice you can give to solve this problem.

nuoyazhou2 commented 6 years ago

Please try the following and then run the script.

1) On line #17 of 4C_enzymecmdline.R, change chipdata <- "no" to chipdata <- "yes" 2) On line #18, add chip1 <- "H3K4me3" #no .bed suffix You also need to revert the change you have done. So change if (control == "no") { chipfile <- c("H3K4me3.bed")

back to original. 3) Create a file named chip_name.txt, and then put string H3K4me3 into it.

On Thu, Oct 26, 2017 at 3:31 PM, aliu90 notifications@github.com wrote:

Hi I'm still having trouble getting the Chip-Seq enrichment analysis to work.

I downloaded a ChIP-seq dataset from UCSC. It's the bed file for peaks (I'm not sure if this is the right file type or maybe I have to modify this file in some way).

I have no control dataset, so the only modification to the R script I made is here:

if (control == "no") { chipfile <- c("H3K4me3.bed")

And I have this bed file in my working directory.

This is the error message I'm getting: Error in if (chipdata == "yes" && file.info("chip_name.txt")$size > 0) { : missing value where TRUE/FALSE needed Execution halted

I'd really appreciate any advice you can give to solve this problem.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_WGLab_w4CSeq_issues_4-23issuecomment-2D339819784&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=26u9V9UTp0l2eo_UmngXXhPKKRbmfYJ2dFjzQG9XkNY&s=lVghFXVpmJ_RtySDckagkgynapLAyp-pJev7F382nbo&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AOhFi0WQbRuWcdfk-2D4I2j-2DRjja91Za6eks5swQhEgaJpZM4P3XrK&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=26u9V9UTp0l2eo_UmngXXhPKKRbmfYJ2dFjzQG9XkNY&s=GP3g8qMyTA3-Iv9Q_F49SFYFHHGVlkSRImrJctmagZQ&e= .

aliu90 commented 6 years ago

I did the changes and I'm still getting the same error message. The file I created chip_name.txt is in my working directory. Do I need to read it into the script or does the script already know to open it?

aliu90 commented 6 years ago

After a few runs, I think the script finally recognizes my chip file. I'm getting a different error message now:

Error in boxplot.default(chip_frame, col = c("red", "grey"), at = setdiff(1:(3 * : invalid first argument Calls: boxplot -> boxplot.default Execution halted

I don't think it likes my chip input file. Is this not the type of file I should be using?

screen shot 2017-10-30 at 12 23 39 am
nuoyazhou2 commented 6 years ago

Please delete the first column of your ChIP file, as only bed file is accepted. In addition, you might want to comment out the header line or delete it simply.

On Sun, Oct 29, 2017 at 9:24 PM, aliu90 notifications@github.com wrote:

After a few runs, I think the script finally recognizes my chip file. I'm getting a different error message now:

Error in boxplot.default(chip_frame, col = c("red", "grey"), at = setdiff(1:(3 * : invalid first argument Calls: boxplot -> boxplot.default Execution halted

I don't think it likes my chip input file. Is this not the type of file I should be using?

[image: screen shot 2017-10-30 at 12 23 39 am] https://urldefense.proofpoint.com/v2/url?u=https-3A__user-2Dimages.githubusercontent.com_26095513_32154684-2D9a118c5a-2Dbd08-2D11e7-2D8525-2D0130c137adf5.png&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=Yt-DYyFhKJrmKFOJMXN5ghjig1fNUh1wgiDeerntGRg&s=DGZJeh9fmQhuNUy7n0sDEM8QIck6YMd9XPf_gLyRPjg&e=

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_WGLab_w4CSeq_issues_4-23issuecomment-2D340340645&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=Yt-DYyFhKJrmKFOJMXN5ghjig1fNUh1wgiDeerntGRg&s=TZ1hQM8fosIkWlg1HU7JYnBBEHyseUuB3B6VEM_rTEQ&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AOhFi7lIV8Lrba7JBwAXqJItk41NWpw0ks5sxU-2DWgaJpZM4P3XrK&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=Yt-DYyFhKJrmKFOJMXN5ghjig1fNUh1wgiDeerntGRg&s=qqF-gEeOiCAu8icnXPcUbllb_KvUcYXuE82tJdqh8gQ&e= .

aliu90 commented 6 years ago

I'm still getting the following error: Error in boxplot.default(chip_frame, col = c("red", "grey"), at = setdiff(1:(3 * : invalid first argument

I ran the code line by line and immediately after:

system(paste("sort -R ", enzyme_genome, " | awk '$1 != \"chrY\"' | head -n 100000 | sort -k1,1 -k2,2n > enzyme_genome_100k.bedsort0", sep=""))

I get this message, which I think may be why it's failing. sort: write failed: 'standard output': Broken pipe sort: write error

Any idea how to fix this? I have included the script and file I'm working with.

chip_name.txt chipseq_enrichment.txt

nuoyazhou2 commented 6 years ago

The following line is not the reason for the error. (Whenever sort doesn't write all its output, it reports the "Broken pipe" message, which can be ignored in this case.) system(paste("sort -R ", enzyme_genome, " | awk '$1 != "chrY"' | head -n 100000 | sort -k1,1 -k2,2n > enzyme_genome_100k.bedsort0", sep=""))

The reason for error is shown here. Error in boxplot.default(chip_frame, col = c("red", "grey"), at = setdiff(1:(3 * : which suggests your chip_frame might be empty.

As you have modified the code, you need to make sure your changes are consistent throughout your code. Please find this line, chipfile <- c("chip1", "chip2", "chip3", "chip4", "chip5", "chip6", "chip7", "chip8", "chip9", "chip10"), and delete the double quotation marks. By that I mean you change it to chipfile <- c(chip1, chip2, chip3, chip4, chip5, chip6, chip7, chip8, chip9, chip10). (These are variables to which you have assigned values in the first a few lines in your code.)

On Thu, Nov 2, 2017 at 11:33 AM, aliu90 notifications@github.com wrote:

I'm still getting the following error: Error in boxplot.default(chip_frame, col = c("red", "grey"), at = setdiff(1:(3 * : invalid first argument

I ran the code line by line and immediately after:

system(paste("sort -R ", enzyme_genome, " | awk '$1 != "chrY"' | head -n 100000 | sort -k1,1 -k2,2n > enzyme_genome_100k.bedsort0", sep=""))

I get this message, which I think may be why it's failing. sort: write failed: 'standard output': Broken pipe sort: write error

Any idea how to fix this? I have included the script and file I'm working with.

chip_name.txt https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_WGLab_w4CSeq_files_1438907_chip-5Fname.txt&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=x_fuE7mx6Gl_h8ykp9AFgh4A03qicm6wlSR6EhJyZZM&s=MWqAAKuC007rVFSI4QALYTB2xzS0-6tCVz63RYPAIEE&e= chipseq_enrichment.txt https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_WGLab_w4CSeq_files_1438909_chipseq-5Fenrichment.txt&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=x_fuE7mx6Gl_h8ykp9AFgh4A03qicm6wlSR6EhJyZZM&s=qeld3BY3Fnt2-ukh-Ryl8eGqcIdMuBFQgXPnEwy22Oc&e=

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_WGLab_w4CSeq_issues_4-23issuecomment-2D341517732&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=x_fuE7mx6Gl_h8ykp9AFgh4A03qicm6wlSR6EhJyZZM&s=vojo7ou8gaBfeWtrg0a0hdv9Yse6aCwO4Ke8roZLJP0&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AOhFi893XLAqsTMqEZiSqpOXGZkmfMRGks5sygrwgaJpZM4P3XrK&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=x_fuE7mx6Gl_h8ykp9AFgh4A03qicm6wlSR6EhJyZZM&s=bnN_cVGfEySsWqIRKognWxXkvsng_xAAUl9XNcxsjvA&e= .

aliu90 commented 6 years ago

The code seems to be working now, I got a pdf image to print properly. Unfortunately, beyond the axes, title, and labels, the actual data part is completely blank. I have attached the txt files created by the Rscript. The last columns are entirely zeros. I don't think this should be the case. Is there something I should be changing in my ChIP-seq input bed file?

CTCF_rand.txt CTCF.txt

nuoyazhou2 commented 6 years ago

The last columns in your files are not all zeros, but most of them are very small numbers. That indicates that CTCF doesn't have much enrichment around your 4C sites. You may want to increase the window size for windowBed and change ylim=c(-45,45) to customize your small number of enrichments.

On Fri, Dec 1, 2017 at 11:32 AM, aliu90 notifications@github.com wrote:

The code seems to be working now, I got a pdf image to print properly. Unfortunately, beyond the axes, title, and labels, the actual data part is completely blank. I have attached the txt files created by the Rscript. The last columns are entirely zeros. I don't think this should be the case. Is there something I should be changing in my ChIP-seq input bed file?

CTCF_rand.txt https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_WGLab_w4CSeq_files_1522740_CTCF-5Frand.txt&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=VXWitA59s05f0kOcYiHJu4l75y0IWbUexxyRaA7CAYE&s=pdkVcLJJJc3X4Q-eXjvBdBKPuGbesOT8hBiLz5FydQ4&e= CTCF.txt https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_WGLab_w4CSeq_files_1522741_CTCF.txt&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=VXWitA59s05f0kOcYiHJu4l75y0IWbUexxyRaA7CAYE&s=scl2tY4he4sdliNqgJ4zFAyizFX-ON64U7SzVLfRAhE&e=

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_WGLab_w4CSeq_issues_4-23issuecomment-2D348592775&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=VXWitA59s05f0kOcYiHJu4l75y0IWbUexxyRaA7CAYE&s=g3o6Y7FSlhV8fdQW6oe-jmilBwcPa-TeKrIsi2ScBVk&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AOhFizYeaGhuUXHpVOHDmh1ka-2D-2DssKqiks5s8FQxgaJpZM4P3XrK&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=VXWitA59s05f0kOcYiHJu4l75y0IWbUexxyRaA7CAYE&s=BrQ4F-iwYUT1c1l2ph-dwPwK4hfL4s8ePEwYjJYfSp4&e= .

aliu90 commented 6 years ago

I'm not sure how to increase the window size for windowBed, also ylim is already set to c(-45,45).

Could there be something wrong with my input bed files? I'm a bit surprised by the low enrichment across so many different ChIP-seq datasets. The output pdf is just blank. Wouldn't I at least see something for "Random sites"?

CTCF_bedinput.txt ChIP-Seq.pdf

nuoyazhou2 commented 6 years ago

From the plot, you can see it's not blank. Rather, the boxes are "squeezed" around zero, due to low enrichment level of factors around your 4C sites. To address that, you need to increase the window size for windowBed by adjusting the "-c" parameter. Take a look at this. http://bedtools.readthedocs.io/en/latest/content/tools/window.html

On the other hand, to show the full range of the boxplot, you'd better narrow down the ylim from c(-45, 45) to a smaller range like c(-10,10).

On Sun, Dec 3, 2017 at 12:38 PM, aliu90 notifications@github.com wrote:

I'm not sure how to increase the window size for windowBed, also ylim is already set to c(-45,45).

Could there be something wrong with my input bed files? I'm a bit surprised by the low enrichment across so many different ChIP-seq datasets. The output pdf is just blank. Wouldn't I at least see something for "Random sites"?

CTCF_bedinput.txt https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_WGLab_w4CSeq_files_1525135_CTCF-5Fbedinput.txt&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=6CxkYPbt0WHwr4QODC54BS2BjukydluJ3qQ44IAVcZg&s=1mS2BTxMkks68ZQ83v4jHk14ybpezHVTNp-F8ONO7Bk&e= ChIP-Seq.pdf https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_WGLab_w4CSeq_files_1525136_ChIP-2DSeq.pdf&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=6CxkYPbt0WHwr4QODC54BS2BjukydluJ3qQ44IAVcZg&s=UhlY0ZWP3LWZyiYa4Zn9aFKwI5NLAJrkRYYMFf44sao&e=

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_WGLab_w4CSeq_issues_4-23issuecomment-2D348812569&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=6CxkYPbt0WHwr4QODC54BS2BjukydluJ3qQ44IAVcZg&s=mUJ9rChDYk8tUAjNwO3h_rg351weF71oUoir_VsAFa8&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AOhFix-5F-2Dy5pSl9g46ZmFGe4NOkulajETks5s8wbZgaJpZM4P3XrK&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=rmADQazHldllb-vF1IdPBw&m=6CxkYPbt0WHwr4QODC54BS2BjukydluJ3qQ44IAVcZg&s=x9oMR_wRAjYPc7HWk1xLPcSJGKnfR47sWYiRUtFJ1kQ&e= .