Danko-Lab / tfTarget

Identify transcription factor-enhancer/promoter-gene network from run-on sequencing data
https://www.nature.com/articles/s41588-018-0244-3
7 stars 3 forks source link

Error when associating TFs to TREs and genes (mapTF / get.proximal.genes) #1

Open mattgalbraith opened 5 years ago

mattgalbraith commented 5 years ago

When running tfTarget via run_tfTarget.bsh with the following command: bash run_tfTarget.bsh \ -query $TREATMENT_SAMPLES \ -control $CONTROL_SAMPLES \ -bigWig.path $BIGWIG_PATH \ -prefix gencode_test \ -TRE.path $TRE_MERGED_BED \ -gene.path $ANNOTATION_BED \ -2bit.path $HG19_2BIT \ -pval.up 0.1 \ -pval.down 0.1 \ -ncores 3 \ -dist 50000 \ -closest.N 2 \ -pval.gene 0.1

I am getting the following error:

[1] "associating TFs to TREs and genes" awk: syntax error at source line 1 context is BEGIN{OFS=" "} {print >>> $1,$6== <<< awk: illegal statement at source line 1 awk: illegal statement at source line 1 Error in $<-.data.frame(*tmp*, "closest.N", value = c(1L, 2L, 1L, : replacement has 36 rows, data has 37 Calls: mapTF -> get.proximal.genes -> $<- -> $<-.data.frame Execution halted

This appears to be related to the awk command at lines 18-20 or 43-45 of mapTF.R

R session info with tfTarget loaded:

R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS 10.14.4 \ Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib \ locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 \ attached base packages: [1] stats graphics grDevices utils datasets methods base
\ other attached packages: [1] tfTarget_1.0 \ loaded via a namespace (and not attached): [1] bitops_1.0-6 matrixStats_0.54.0 rtfbsdb_0.4.5
[4] bit64_0.9-7 RColorBrewer_1.1-2 GenomeInfoDb_1.18.1
[7] tools_3.5.1 backports_1.1.3 R6_2.3.0
[10] KernSmooth_2.23-15 rpart_4.1-13 sm_2.2-5.4
[13] Hmisc_4.1-1 DBI_1.0.0 lazyeval_0.2.1
[16] BiocGenerics_0.28.0 colorspace_1.3-2 nnet_7.3-12
[19] tidyselect_0.2.5 gridExtra_2.3 DESeq2_1.22.1
[22] bit_1.1-14 compiler_3.5.1 Biobase_2.42.0
[25] htmlTable_1.12 DelayedArray_0.8.0 rphast_1.6.9
[28] caTools_1.17.1.1 scales_1.0.0 checkmate_1.8.5
[31] genefilter_1.64.0 stringr_1.3.1 apcluster_1.4.7
[34] digest_0.6.18 foreign_0.8-71 XVector_0.22.0
[37] vioplot_0.3.0 base64enc_0.1-3 pkgconfig_2.0.2
[40] htmltools_0.3.6 htmlwidgets_1.3 rlang_0.3.0.1
[43] rstudioapi_0.8 RSQLite_2.1.1 bindr_0.1.1
[46] zoo_1.8-5 BiocParallel_1.16.5 bigWig_0.2-9
[49] gtools_3.8.1 acepack_1.4.1 dplyr_0.7.8
[52] RCurl_1.95-4.11 magrittr_1.5 GenomeInfoDbData_1.2.0
[55] Formula_1.2-3 Matrix_1.2-15 Rcpp_1.0.0
[58] munsell_0.5.0 S4Vectors_0.20.1 stringi_1.2.4
[61] yaml_2.2.0 rtfbs_0.3.9 SummarizedExperiment_1.12.0 [64] zlibbioc_1.28.0 gplots_3.0.1 plyr_1.8.4
[67] grid_3.5.1 blob_1.1.1 gdata_2.18.0
[70] parallel_3.5.1 crayon_1.3.4 lattice_0.20-38
[73] splines_3.5.1 annotate_1.60.0 locfit_1.5-9.1
[76] knitr_1.21 pillar_1.3.0 GenomicRanges_1.34.0
[79] geneplotter_1.60.0 stats4_3.5.1 XML_3.98-1.16
[82] glue_1.3.0 latticeExtra_0.6-28 data.table_1.11.8
[85] gtable_0.2.0 purrr_0.2.5 assertthat_0.2.0
[88] ggplot2_3.1.0 xfun_0.4 xtable_1.8-3
[91] survival_2.43-3 tibble_1.4.2 AnnotationDbi_1.44.0
[94] memoise_1.1.0 IRanges_2.16.0 bindrcpp_0.2.2
[97] cluster_2.0.7-1

tinyi commented 5 years ago

Thank you for the feedback. Could you show the first 10 lines of your $ANNOTATION_BED ? I suspect this might be a MAC-specific issue. Did you try any linux system?

Best,

Tinyi

On Wed, May 22, 2019 at 12:24 AM mattgalbraith notifications@github.com wrote:

When running tfTarget via run_tfTarget.bsh with the following command: bash run_tfTarget.bsh \ -query $TREATMENT_SAMPLES \ -control $CONTROL_SAMPLES \ -bigWig.path $BIGWIG_PATH \ -prefix gencode_test \ -TRE.path $TRE_MERGED_BED \ -gene.path $ANNOTATION_BED \ -2bit.path $HG19_2BIT \ -pval.up 0.1 \ -pval.down 0.1 \ -ncores 3 \ -dist 50000 \ -closest.N 2 \ -pval.gene 0.1

I am getting the following error:

[1] "associating TFs to TREs and genes" awk: syntax error at source line 1 context is BEGIN{OFS=" "} {print >>> $1,$6== <<< awk: illegal statement at source line 1 awk: illegal statement at source line 1 Error in $<-.data.frame(tmp, "closest.N", value = c(1L, 2L, 1L, : replacement has 36 rows, data has 37 Calls: mapTF -> get.proximal.genes -> $<- -> $<-.data.frame Execution halted

This appears to be related to the awk command at lines 18-20 or 43-45 of mapTF.R

R session info with tfTarget loaded:

R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS 10.14.4

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] tfTarget_1.0

loaded via a namespace (and not attached): [1] bitops_1.0-6 matrixStats_0.54.0 rtfbsdb_0.4.5 [4] bit64_0.9-7 RColorBrewer_1.1-2 GenomeInfoDb_1.18.1 [7] tools_3.5.1 backports_1.1.3 R6_2.3.0 [10] KernSmooth_2.23-15 rpart_4.1-13 sm_2.2-5.4 [13] Hmisc_4.1-1 DBI_1.0.0 lazyeval_0.2.1 [16] BiocGenerics_0.28.0 colorspace_1.3-2 nnet_7.3-12 [19] tidyselect_0.2.5 gridExtra_2.3 DESeq2_1.22.1 [22] bit_1.1-14 compiler_3.5.1 Biobase_2.42.0 [25] htmlTable_1.12 DelayedArray_0.8.0 rphast_1.6.9 [28] caTools_1.17.1.1 scales_1.0.0 checkmate_1.8.5 [31] genefilter_1.64.0 stringr_1.3.1 apcluster_1.4.7 [34] digest_0.6.18 foreign_0.8-71 XVector_0.22.0 [37] vioplot_0.3.0 base64enc_0.1-3 pkgconfig_2.0.2 [40] htmltools_0.3.6 htmlwidgets_1.3 rlang_0.3.0.1 [43] rstudioapi_0.8 RSQLite_2.1.1 bindr_0.1.1 [46] zoo_1.8-5 BiocParallel_1.16.5 bigWig_0.2-9 [49] gtools_3.8.1 acepack_1.4.1 dplyr_0.7.8 [52] RCurl_1.95-4.11 magrittr_1.5 GenomeInfoDbData_1.2.0 [55] Formula_1.2-3 Matrix_1.2-15 Rcpp_1.0.0 [58] munsell_0.5.0 S4Vectors_0.20.1 stringi_1.2.4 [61] yaml_2.2.0 rtfbs_0.3.9 SummarizedExperiment_1.12.0 [64] zlibbioc_1.28.0 gplots_3.0.1 plyr_1.8.4 [67] grid_3.5.1 blob_1.1.1 gdata_2.18.0 [70] parallel_3.5.1 crayon_1.3.4 lattice_0.20-38 [73] splines_3.5.1 annotate_1.60.0 locfit_1.5-9.1 [76] knitr_1.21 pillar_1.3.0 GenomicRanges_1.34.0 [79] geneplotter_1.60.0 stats4_3.5.1 XML_3.98-1.16 [82] glue_1.3.0 latticeExtra_0.6-28 data.table_1.11.8 [85] gtable_0.2.0 purrr_0.2.5 assertthat_0.2.0 [88] ggplot2_3.1.0 xfun_0.4 xtable_1.8-3 [91] survival_2.43-3 tibble_1.4.2 AnnotationDbi_1.44.0 [94] memoise_1.1.0 IRanges_2.16.0 bindrcpp_0.2.2 [97] cluster_2.0.7-1

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/tfTarget/issues/1?email_source=notifications&email_token=AB4NHSY3LA27IBLHEKIYMFTPWTDHFA5CNFSM4HOQR3E2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GVDR3PQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AB4NHS3Z3JHM75YUNKTWHZDPWTDHFANCNFSM4HOQR3EQ .

mattgalbraith commented 5 years ago

head ~/Refs/hg19/gencode.v19.annotation.bed chr1 11868 14412 ENSG00000223972.4 DDX11L1 + chr1 14362 29806 ENSG00000227232.4 WASH7P - chr1 29553 31109 ENSG00000243485.2 MIR1302-11 + chr1 34553 36081 ENSG00000237613.2 FAM138A - chr1 52472 54936 ENSG00000268020.2 OR4G4P + chr1 62947 63887 ENSG00000240361.1 OR4G11P + chr1 69090 70008 ENSG00000186092.4 OR4F5 + chr1 89294 133566 ENSG00000238009.2 RP11-34P13.7 - chr1 89550 91105 ENSG00000239945.1 RP11-34P13.8 - chr1 131024 134836 ENSG00000233750.3 CICP27 +

I was unable to successfully get all the R dependencies installed on our linux system, hence using the Mac.

mattgalbraith commented 5 years ago

I have now managed to get tfTarget and all dependencies running on linux and no longer get the awk error. However, I am now getting a new error:

[1] "associating TFs to TREs and genes" Error in names(x) <- value : 'names' attribute [27] must be the same length as the vector [16] Calls: mapTF -> colnames<- Execution halted

From looking into the mapTF function, it appears that TF.TRE.gene.tab.short <- TF.TRE.gene.tab[, -c(1, 13:15)] is generating a data frame with only 16 columns rather than the 27 suggested by header.vec <- c("tre.chrom", "tre.chromStart", "tre.chromEnd", "tf.chrom", "tf.chromStart", "tf.chromEnd", "score", "strand", "motif.name", "motif.id", "motif.idx", "TRE.baseMean", "TRE.log2FoldChange", "TRE.pvalue", "TRE.padj", "gene.TSS.chr", "gene.TSS.start", "gene.TSS.end", "transcript.id", "gene.name", "gene.strand", "gene.baseMean", "gene.log2FoldChange", "gene.pvalue", "gene.padj", "distance")

if (!is.null(closest.N)) header.vec <- c(header.vec, "closest.N")

colnames(TF.TRE.gene.tab.short) <- header.vec

I will try running the R commands manually to see if I can track this down any further...

mattgalbraith commented 5 years ago

For reference: The last error was caused by an empty TF.TRE.gene.tab object due to the stringency of settings used.

tinyi commented 5 years ago

Thank you for your feedback. I guess this is caused by the lack of statistical power (determined by DESeq2) where fewer than 2 replicates were used for each condition.

On Tue, May 28, 2019 at 12:42 PM mattgalbraith notifications@github.com wrote:

For reference: The last error was caused by an empty TF.TRE.gene.tab object due to the stringency of settings used.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/tfTarget/issues/1?email_source=notifications&email_token=AB4NHSYJUPVFFJZOCTIPG73PXVOF7A5CNFSM4HOQR3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWMXERA#issuecomment-496595524, or mute the thread https://github.com/notifications/unsubscribe-auth/AB4NHSY5RKT4BLER2TVZVZDPXVOF7ANCNFSM4HOQR3EQ .

CholponZ commented 4 years ago

I have now managed to get tfTarget and all dependencies running on linux and no longer get the awk error. However, I am now getting a new error:

[1] "associating TFs to TREs and genes" Error in names(x) <- value : 'names' attribute [27] must be the same length as the vector [16] Calls: mapTF -> colnames<- Execution halted

From looking into the mapTF function, it appears that TF.TRE.gene.tab.short <- TF.TRE.gene.tab[, -c(1, 13:15)] is generating a data frame with only 16 columns rather than the 27 suggested by header.vec <- c("tre.chrom", "tre.chromStart", "tre.chromEnd", "tf.chrom", "tf.chromStart", "tf.chromEnd", "score", "strand", "motif.name", "motif.id", "motif.idx", "TRE.baseMean", "TRE.log2FoldChange", "TRE.pvalue", "TRE.padj", "gene.TSS.chr", "gene.TSS.start", "gene.TSS.end", "transcript.id", "gene.name", "gene.strand", "gene.baseMean", "gene.log2FoldChange", "gene.pvalue", "gene.padj", "distance")

if (!is.null(closest.N)) header.vec <- c(header.vec, "closest.N")

colnames(TF.TRE.gene.tab.short) <- header.vec

I will try running the R commands manually to see if I can track this down any further...

I am having the same issue. I wonder if your manual solution did work.

Best regards