OmarMossad commented 6 years ago

Hi Mayur,

I tried running my files for the differential expression analysis. had this error. I tested with your files and the error seems to persist. have you faced this before? can I somehow know what is wrong?

I reinstalled several times to make sure I have all the packages and this seems fine.

GUAVA version 1 bedtools: Working! bedGraphToBigWig: Working! igv: Working! samtools: Working! /home/mossad/Downloads/output/test_GUAVA_Differental_analysis/test_GUAVA_Differental_analysis ---------- Create read count matrix ---------- DMSO_Rep1_R1_peaks.narrowPeak == 150614 DMSO_Rep2_R1_peaks.narrowPeak == 130387 BYL719_Rep1_R1_peaks.narrowPeak == 216356 BYL719_Rep2_R1_peaks.narrowPeak == 206310 ---------- Run DESeq2 code ---------- ---------- Display volcano plot ---------- ---------- functional annotation ---------- Error occured, analysis stopped

MayurDivate commented 5 years ago

Hey Omar,

Any get through? Have you tried everything that we discussed?

Cheers, Mayur

learner146911 commented 5 years ago

Hi Mayur I am having the same issue here.. how do we get around that? Thanks

MayurDivate commented 5 years ago

Please send me the log file. Mayur

learner146911 commented 5 years ago

Hi Mayur,

I raised an issue on Github about not being able to run GUAVA functional annotation successfully and you asked for the log file.

Its attached in this email.

I am confident that the software is working because when I ran some of the human datasets (hg19) I was able to do it but strangely not for the mouse datasets(mm10; attached log file). Those libraries were sequenced in the lab by the same Illumina sequencer and processed/analysed by myself using the exact same pipeline (just different genome).

In the functional annotation output folder an R script (satb1_test_Reps_beds_ChIPpeakAnno.R) was given and when I ran the script on R I realised it is giving warning at the "peaks.anno <- annotatePeakInBatch(peaks, AnnotationData=ucscGenes, output = "nearestBiDirectionalPromoters", bindingRegion = c(-5000,3000))" (see attached log file for warning messages) so I am speculating this is where it got stuck?

I tried both peak formats - .bed and .narrowPeak and both are not working for the mouse dataset.

Thanks Mayur appreciate your help on this.

Regards,

Ying

#######################################################

Project summary

Name of the project: satb1_test_Reps_beds

Genome build: mm10

Pvalue: 0.05

Foldchange: 2.0

Upstream: 5000

Downstream: 3000

Input files

Control REP I bam: WT_US_rep1.bam

Control REP II bam: WT_US_rep2.bam

Treatment REP I bam: KO_US_rep1.bam

Treatment REP II bam: KO_US_rep2.bam

Control REP I bed: WT_US_rep1_501bp_summits.bed

Control REP II bed: WT_US_rep2_501bp_summits.bed

Treatment REP I bed: KO_US_rep1_501bp_summits.bed

Treatment REP II bed: KO_US_rep2_501bp_summits.bed

#######################################################

--------- DESeq2 --------- -------1-----------------------

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.28.1

//========================== featureCounts setting ===========================\						Input files : 4 BAM files				P /home/ubuntu/satb1/WT_US_rep1.bam				P /home/ubuntu/satb1/WT_US_rep2.bam				P /home/ubuntu/satb1/KO_US_rep1.bam				P /home/ubuntu/satb1/KO_US_rep2.bam
Dir for temp files : .
Threads : 1
Level : meta-feature level
Paired-end : yes
Strand specific : no
Multimapping reads : counted
Multi-overlapping reads : not counted
Min overlapping bases : 1

Chimeric reads : not counted
Both ends mapped : required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\						Load annotation file ./.Rsubread_UserProvidedAnnotation_pid444 ...				Features : 37052				Meta-features : 37052				Chromosomes/contigs : 40
Process BAM file /home/ubuntu/satb1/WT_US_rep1.bam...
Paired-end reads are included.
Assign fragments (read pairs) to features...

WARNING: reads from the same pair were found not adjacent to each
other in the input (due to read sorting by location or
reporting of multi-mapping read pairs).

Read re-ordering is performed.

Total fragments : 54333519
Successfully assigned fragments : 14776299 (27.2%)
Running time : 9.38 minutes

Process BAM file /home/ubuntu/satb1/WT_US_rep2.bam...
Paired-end reads are included.
Assign fragments (read pairs) to features...

WARNING: reads from the same pair were found not adjacent to each
other in the input (due to read sorting by location or
reporting of multi-mapping read pairs).

Read re-ordering is performed.

Total fragments : 41418093
Successfully assigned fragments : 9809104 (23.7%)
Running time : 6.96 minutes

Process BAM file /home/ubuntu/satb1/KO_US_rep1.bam...
Paired-end reads are included.
Assign fragments (read pairs) to features...

WARNING: reads from the same pair were found not adjacent to each
other in the input (due to read sorting by location or
reporting of multi-mapping read pairs).

Read re-ordering is performed.

Total fragments : 62847330
Successfully assigned fragments : 13690774 (21.8%)
Running time : 11.93 minutes

Process BAM file /home/ubuntu/satb1/KO_US_rep2.bam...
Paired-end reads are included.
Assign fragments (read pairs) to features...

WARNING: reads from the same pair were found not adjacent to each
other in the input (due to read sorting by location or
reporting of multi-mapping read pairs).

Read re-ordering is performed.

Total fragments : 35367614
Successfully assigned fragments : 8028469 (22.7%)
Running time : 5.09 minutes

Read assignment finished.

\===================== http://subread.sourceforge.net/ ======================//

null device 1 [1] "gained-close" "No-change" "gained-open" gained-close No-change gained-open 6 37043 0 null device 1

2____ Loading required package: S4Vectors Loading required package: methods Loading required package: stats4 Loading required package: BiocGenerics Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, cbind, colMeans, colnames,
colSums, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match,
mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
table, tapply, union, unique, unsplit, which, which.max, which.min

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

expand.grid

Loading required package: IRanges Loading required package: GenomicRanges Loading required package: GenomeInfoDb Loading required package: SummarizedExperiment Loading required package: Biobase Welcome to Bioconductor

Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: DelayedArray Loading required package: matrixStats

Attaching package: ‘matrixStats’

The following objects are masked from ‘package:Biobase’:

anyMissing, rowMedians

Attaching package: ‘DelayedArray’

The following objects are masked from ‘package:matrixStats’:

colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

The following object is masked from ‘package:base’:

apply

estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing

annotatePeakInBatch(peaks, AnnotationData=ucscGenes, output = "nearestBiDirectionalPromoters", bindingRegion = c (-5000,3000)) Annotate peaks by annoPeaks, see ?annoPeaks for details. maxgap will be ignored. GRanges object with 0 ranges and 0 metadata columns: seqnames ranges strand
------- seqinfo: no sequences Warning messages: 1: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) : GRanges object contains 1 out-of-bound range located on sequence chr4_JH584292_random. Note that only ranges located on a non-circular sequence whose length is not NA can be considered out-of-bound (use seqlengths() and isCircular() to get the lengths and circularity flags of the underlying sequences). You can use trim() to trim these ranges. See ?`trim,GenomicRanges-method` for more information. 2: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) : GRanges object contains 1 out-of-bound range located on sequence chr4_JH584292_random. Note that only ranges located on a non-circular sequence whose length is not NA can be considered out-of-bound (use seqlengths() and isCircular() to get the lengths and circularity flags of the underlying sequences). You can use trim() to trim these ranges. See ?`trim,GenomicRanges-method` for more information. 3: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) : GRanges object contains 1 out-of-bound range located on sequence chr4_JH584292_random. Note that only ranges located on a non-circular sequence whose length is not NA can be considered out-of-bound (use seqlengths() and isCircular() to get the lengths and circularity flags of the underlying sequences). You can use trim() to trim these ranges. See ?`trim,GenomicRanges-method` for more information. 4: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) : GRanges object contains 1 out-of-bound range located on sequence chr4_JH584292_random. Note that only ranges located on a non-circular sequence whose length is not NA can be considered out-of-bound (use seqlengths() and isCircular() to get the lengths and circularity flags of the underlying sequences). You can use trim() to trim these ranges. See ?`trim,GenomicRanges-method` for more information.

peaks.anno <- addGeneIDs(annotatedPeak=peaks.anno,

orgAnn=OrgEgDb,
feature_id_type="entrez_id",
IDs2Add="symbol") Error: There is no feature column in annotatedPeak or annotatedPeak has size 0! In addition: Warning message: In is.na(feature_ids) : is.na() applied to non-(list or vector) of type 'NULL'

peaks.anno@ranges@NAMES <- NULL peaksAnnoOut <- as.data.frame(peaks.anno) peaksAnnoOut <- peaksAnnoOut[,c(1,2,3,4,6,8,9,13,14,15,16)] Error in [.data.frame(peaksAnnoOut, , c(1, 2, 3, 4, 6, 8, 9, 13, 14, : undefined columns selected

MayurDivate commented 5 years ago

Do you mean that you are using different genomes builds for atac-seq analysis and differential analysis? Which genome did you use to process each sample?

Mayur

OmarMossad commented 5 years ago

Hey Mayur,

Unfortunately it didn't work out for me. I have tried what you suggested though. I am now using a pipeline from ENCODE. https://github.com/kundajelab/atac-seq-pipeline

Best, Omar

On Wed, Nov 28, 2018 at 12:49 AM Mayur Divate notifications@github.com wrote:

Hey Omar,

Any get through? Have you tried everything that we discussed?

Cheers, Mayur

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MayurDivate/GUAVASourceCode/issues/2#issuecomment-442262929, or mute the thread https://github.com/notifications/unsubscribe-auth/AcEjCn5qAmZl2BWNzsAtsqQF6anM_S-Gks5uzc-VgaJpZM4Xyc11 .

learner146911 commented 5 years ago

Hi Mayur,

I used the same genome build for atac and differential analysis. I did the alignment, peak calling on my end and fed the peak files and BAM to GUAVA for DE analysis.

Regards, Ying

From: Mayur Divate notifications@github.com Sent: Wednesday, 6 March 2019 11:26 AM To: MayurDivate/GUAVASourceCode Cc: learner146911; Comment Subject: Re: [MayurDivate/GUAVASourceCode] Error occurs at functional annotation (#2)

Do you mean that you are using different genomes builds for atac-seq analysis and differential analysis? Which genome did you use to process each sample?

Mayur

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/MayurDivate/GUAVASourceCode/issues/2#issuecomment-469935065, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Am8AzD2k1ZWkcqiy73hLyxEJCDkV6gTaks5vTyBcgaJpZM4Xyc11.

MayurDivate / GUAVASourceCode

Error occurs at functional annotation #2

Project summary

Name of the project: satb1_test_Reps_beds

Genome build: mm10

Pvalue: 0.05

Foldchange: 2.0

Upstream: 5000

Downstream: 3000

Input files

Control REP I bam: WT_US_rep1.bam

Control REP II bam: WT_US_rep2.bam

Treatment REP I bam: KO_US_rep1.bam

Treatment REP II bam: KO_US_rep2.bam

Control REP I bed: WT_US_rep1_501bp_summits.bed

Control REP II bed: WT_US_rep2_501bp_summits.bed

Treatment REP I bed: KO_US_rep1_501bp_summits.bed

Treatment REP II bed: KO_US_rep2_501bp_summits.bed