estepi / ASpli

Analysis of alternative splicing using RNAseq
7 stars 1 forks source link

error '`[.data.frame`(aDataframe, , match(row.names(targets), colnames(aDataframe)))': undefined columns selected #5

Open KateK opened 5 years ago

KateK commented 5 years ago

Hi!

I want to use your tool in my analysis and everything went perfect till calculating differential usage of genes and later steps. Error that i faced as follow :

error in the command '[.data.frame(aDataframe, , match(row.names(targets), colnames(aDataframe)))': undefined columns selected

Thank you for help in advance!

Regards, Kate

KateK commented 5 years ago

Aslo I wonder wether while plotGenomicRegions i get mean of coverege or something like this after merging bams. Or maybe i only see sum of coverege of merged bams?

CriticalPeriod commented 5 years ago

Hi, same error here !

estepi commented 5 years ago

Hi all! Thanks for using ASpli. Can you paste the output of rownames(targets) ? txs

estepi commented 5 years ago

I recommend not to use special characters in sample name as well as in gene names (those in gtf file), like:"-","_",".","/","+" etc...They are problematic in many steps of ASpli. Im sorry for this

CriticalPeriod commented 5 years ago

Hi back, Still the same problem. I renamed all the bam files & gff. bamFiles <- c( "OTX2P30/30het1.bam", "OTX2P30/30het2.bam", "OTX2P30/30hom1.bam", "OTX2P30/30hom2.bam" ) targets <- data.frame( row.names = c("30het1","30het2", "30hom1", "30hom2"),bam = bamFiles,genotype = c("HET","HET", "HOM", "HOM") ,stringsAsFactors = FALSE )

targets bam genotype 30het1 OTX2P30/30het1.bam HET 30het2 OTX2P30/30het2.bam HET 30hom1 OTX2P30/30hom1.bam HOM 30hom2 OTX2P30/30hom2.bam HOM

Note that this is not functionnal since it goes like this in the aspli

rownames(targets) [1] "30het1" "30het2" "30hom1" "30hom2" as <- AsDiscover( counts, targets, features, bam, readLength=51L,threshold = 5) Error in data.frame(matrix(unlist(strsplit(jnames, "[.]")), byrow = TRUE, : row names supplied are of the wrong length De plus : Warning message: In matrix(unlist(strsplit(jnames, "[.]")), byrow = TRUE, ncol = 3) : la longueur des données [580660] n'est pas un diviseur ni un multiple du nombre de lignes [193554]

So I renamed the files, according to same number of letter than genotype +1 targets <- data.frame( row.names = c("het1","het2", "hom1", "hom2"),bam = bamFiles,genotype = c("HET","HET", "HOM", "HOM") ,stringsAsFactors = FALSE )

targets bam genotype het1 OTX2P30/30het1.bam HET het2 OTX2P30/30het2.bam HET hom1 OTX2P30/30hom1.bam HOM hom2 OTX2P30/30hom2.bam HOM

This goes back to the original problem

rownames(targets) [1] "het1" "het2" "hom1" "hom2" as <- AsDiscover( counts, targets, features, bam, readLength=51L,threshold = 5) Error in [.data.frame(aDataframe, , match(row.names(targets), colnames(aDataframe))) : undefined columns selected

Also, in write I have this error :

writeRds(counts=counts, output.dir = "reads") Error in file(file, ifelse(append, "a", "w")) : impossible d'ouvrir la connexion De plus : Warning message: In file(file, ifelse(append, "a", "w")) : impossible d'ouvrir le fichier 'reads/genes/gene.rd.tab' : No such file or directory writeRds(counts=counts, output.dir = "reads") Error in file(file, ifelse(append, "a", "w")) : impossible d'ouvrir la connexion De plus : Warning message: In file(file, ifelse(append, "a", "w")) : impossible d'ouvrir le fichier 'reads/genes/gene.rd.tab' : No such file or directory

I don't know if it is related. Also here is my SessionInfo() R version 3.5.3 (2019-03-11) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.4

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale: [1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] matrixStats_0.54.0 GenomicFeatures_1.34.8 AnnotationDbi_1.44.0 Biobase_2.42.0 GenomicRanges_1.34.0 GenomeInfoDb_1.18.2 [7] IRanges_2.16.0 S4Vectors_0.20.1 BiocGenerics_0.28.0 ASpli_1.8.1 edgeR_3.24.3 limma_3.38.3

loaded via a namespace (and not attached): [1] ProtGenerics_1.14.0 bitops_1.0-6 bit64_0.9-7 RColorBrewer_1.1-2 progress_1.2.0 [6] httr_1.4.0 tools_3.5.3 backports_1.1.4 R6_2.4.0 rpart_4.1-15 [11] Hmisc_4.2-0 DBI_1.0.0 lazyeval_0.2.2 Gviz_1.26.5 colorspace_1.4-1 [16] nnet_7.3-12 gridExtra_2.3 prettyunits_1.0.2 curl_3.3 bit_1.1-14 [21] compiler_3.5.3 htmlTable_1.13.1 DelayedArray_0.8.0 rtracklayer_1.42.2 scales_1.0.0 [26] checkmate_1.9.1 stringr_1.4.0 digest_0.6.18 Rsamtools_1.34.1 foreign_0.8-71 [31] rmarkdown_1.12 XVector_0.22.0 base64enc_0.1-3 dichromat_2.0-0 pkgconfig_2.0.2 [36] htmltools_0.3.6 ensembldb_2.6.8 BSgenome_1.50.0 htmlwidgets_1.3 rlang_0.3.4 [41] rstudioapi_0.10 RSQLite_2.1.1 BiocParallel_1.16.6 acepack_1.4.1 VariantAnnotation_1.28.13 [46] RCurl_1.95-4.12 magrittr_1.5 GenomeInfoDbData_1.2.0 Formula_1.2-3 Matrix_1.2-17 [51] Rcpp_1.0.1 munsell_0.5.0 stringi_1.4.3 yaml_2.2.0 SummarizedExperiment_1.12.0 [56] zlibbioc_1.28.0 plyr_1.8.4 grid_3.5.3 blob_1.1.1 crayon_1.3.4 [61] lattice_0.20-38 Biostrings_2.50.2 splines_3.5.3 hms_0.4.2 locfit_1.5-9.1 [66] knitr_1.22 pillar_1.3.1 biomaRt_2.38.0 XML_3.98-1.19 evaluate_0.13 [71] biovizBase_1.30.1 latticeExtra_0.6-28 data.table_1.12.2 BiocManager_1.30.4 gtable_0.3.0 [76] assertthat_0.2.1 ggplot2_3.1.1 xfun_0.6 AnnotationFilter_1.6.0 survival_2.44-1.1 [81] tibble_2.1.1 GenomicAlignments_1.18.1 memoise_1.1.0 cluster_2.0.8 BiocStyle_2.10.0

Besides that, the "counts" part seems to work fine since I have differential analysis results that match the one I already have. Question, is there normalization included in ASpli ? I hope you can find something useful, David

Le mar. 23 avr. 2019 à 11:33, Estefania Mancini notifications@github.com a écrit :

I recommend not to use special characters in sample name as well as in gene names (those in gtf file), like:"-","_",".","/","+" etc...They are problematic in many steps of ASpli. Im sorry for this

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/estepi/ASpli/issues/5#issuecomment-485727532, or mute the thread https://github.com/notifications/unsubscribe-auth/ALKNJ3X47V5Q6NJNGFWFUDTPR3JVTANCNFSM4HG367EQ .

estepi commented 5 years ago

Hi. Sorry for this. Can you please paste the head of features or genome objects? I mean, is something weird (special character) in gene names?

CriticalPeriod commented 5 years ago

Le mar. 23 avr. 2019 à 16:50, David Benacom david.benacom@gmail.com a écrit :

Hi back, Still the same problem. I renamed all the bam files & gff. bamFiles <- c( "OTX2P30/30het1.bam", "OTX2P30/30het2.bam", "OTX2P30/30hom1.bam", "OTX2P30/30hom2.bam" ) targets <- data.frame( row.names = c("30het1","30het2", "30hom1", "30hom2"),bam = bamFiles,genotype = c("HET","HET", "HOM", "HOM") ,stringsAsFactors = FALSE )

targets bam genotype 30het1 OTX2P30/30het1.bam HET 30het2 OTX2P30/30het2.bam HET 30hom1 OTX2P30/30hom1.bam HOM 30hom2 OTX2P30/30hom2.bam HOM

Note that this is not functionnal since it goes like this in the aspli

rownames(targets) [1] "30het1" "30het2" "30hom1" "30hom2" as <- AsDiscover( counts, targets, features, bam, readLength=51L,threshold = 5) Error in data.frame(matrix(unlist(strsplit(jnames, "[.]")), byrow = TRUE, : row names supplied are of the wrong length De plus : Warning message: In matrix(unlist(strsplit(jnames, "[.]")), byrow = TRUE, ncol = 3) : la longueur des données [580660] n'est pas un diviseur ni un multiple du nombre de lignes [193554]

So I renamed the files, according to same number of letter than genotype +1 targets <- data.frame( row.names = c("het1","het2", "hom1", "hom2"),bam = bamFiles,genotype = c("HET","HET", "HOM", "HOM") ,stringsAsFactors = FALSE )

targets bam genotype het1 OTX2P30/30het1.bam HET het2 OTX2P30/30het2.bam HET hom1 OTX2P30/30hom1.bam HOM hom2 OTX2P30/30hom2.bam HOM

This goes back to the original problem

rownames(targets) [1] "het1" "het2" "hom1" "hom2" as <- AsDiscover( counts, targets, features, bam, readLength=51L,threshold = 5) Error in [.data.frame(aDataframe, , match(row.names(targets), colnames(aDataframe))) : undefined columns selected

Also, in write I have this error :

writeRds(counts=counts, output.dir = "reads") Error in file(file, ifelse(append, "a", "w")) : impossible d'ouvrir la connexion De plus : Warning message: In file(file, ifelse(append, "a", "w")) : impossible d'ouvrir le fichier 'reads/genes/gene.rd.tab' : No such file or directory writeRds(counts=counts, output.dir = "reads") Error in file(file, ifelse(append, "a", "w")) : impossible d'ouvrir la connexion De plus : Warning message: In file(file, ifelse(append, "a", "w")) : impossible d'ouvrir le fichier 'reads/genes/gene.rd.tab' : No such file or directory

I don't know if it is related. Also here is my SessionInfo() R version 3.5.3 (2019-03-11) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.4

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale: [1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] matrixStats_0.54.0 GenomicFeatures_1.34.8 AnnotationDbi_1.44.0 Biobase_2.42.0 GenomicRanges_1.34.0 GenomeInfoDb_1.18.2 [7] IRanges_2.16.0 S4Vectors_0.20.1 BiocGenerics_0.28.0 ASpli_1.8.1 edgeR_3.24.3 limma_3.38.3

loaded via a namespace (and not attached): [1] ProtGenerics_1.14.0 bitops_1.0-6 bit64_0.9-7 RColorBrewer_1.1-2 progress_1.2.0 [6] httr_1.4.0 tools_3.5.3 backports_1.1.4 R6_2.4.0 rpart_4.1-15 [11] Hmisc_4.2-0 DBI_1.0.0 lazyeval_0.2.2 Gviz_1.26.5 colorspace_1.4-1 [16] nnet_7.3-12 gridExtra_2.3 prettyunits_1.0.2 curl_3.3 bit_1.1-14 [21] compiler_3.5.3 htmlTable_1.13.1 DelayedArray_0.8.0 rtracklayer_1.42.2 scales_1.0.0 [26] checkmate_1.9.1 stringr_1.4.0 digest_0.6.18 Rsamtools_1.34.1 foreign_0.8-71 [31] rmarkdown_1.12 XVector_0.22.0 base64enc_0.1-3 dichromat_2.0-0 pkgconfig_2.0.2 [36] htmltools_0.3.6 ensembldb_2.6.8 BSgenome_1.50.0 htmlwidgets_1.3 rlang_0.3.4 [41] rstudioapi_0.10 RSQLite_2.1.1 BiocParallel_1.16.6 acepack_1.4.1 VariantAnnotation_1.28.13 [46] RCurl_1.95-4.12 magrittr_1.5 GenomeInfoDbData_1.2.0 Formula_1.2-3 Matrix_1.2-17 [51] Rcpp_1.0.1 munsell_0.5.0 stringi_1.4.3 yaml_2.2.0 SummarizedExperiment_1.12.0 [56] zlibbioc_1.28.0 plyr_1.8.4 grid_3.5.3 blob_1.1.1 crayon_1.3.4 [61] lattice_0.20-38 Biostrings_2.50.2 splines_3.5.3 hms_0.4.2 locfit_1.5-9.1 [66] knitr_1.22 pillar_1.3.1 biomaRt_2.38.0 XML_3.98-1.19 evaluate_0.13 [71] biovizBase_1.30.1 latticeExtra_0.6-28 data.table_1.12.2 BiocManager_1.30.4 gtable_0.3.0 [76] assertthat_0.2.1 ggplot2_3.1.1 xfun_0.6 AnnotationFilter_1.6.0 survival_2.44-1.1 [81] tibble_2.1.1 GenomicAlignments_1.18.1 memoise_1.1.0 cluster_2.0.8 BiocStyle_2.10.0

Besides that, the "counts" part seems to work fine since I have differential analysis results that match the one I already have. Question, is there normalization included in ASpli ? I hope you can find something useful, David

Le mar. 23 avr. 2019 à 11:33, Estefania Mancini notifications@github.com a écrit :

I recommend not to use special characters in sample name as well as in gene names (those in gtf file), like:"-","_",".","/","+" etc...They are problematic in many steps of ASpli. Im sorry for this

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/estepi/ASpli/issues/5#issuecomment-485727532, or mute the thread https://github.com/notifications/unsubscribe-auth/ALKNJ3X47V5Q6NJNGFWFUDTPR3JVTANCNFSM4HG367EQ .

CriticalPeriod commented 5 years ago

head(features, n=5) Error in x[seq_len(n)] : objet de type 'S4' non indiçable str(features) Formal class 'ASpliFeatures' [package "ASpli"] with 3 slots ..@ genes :Formal class 'CompressedGRangesList' [package "GenomicRanges"] with 5 slots .. .. ..@ unlistData :Formal class 'GRanges' [package "GenomicRanges"] with 7 slots .. .. .. .. ..@ seqnames :Formal class 'Rle' [package "S4Vectors"] with 4 slots .. .. .. .. .. .. ..@ values : Factor w/ 45 levels "1","2","3","4",..: 11 16 6 4 5 3 7 12 2 20 ... .. .. .. .. .. .. ..@ lengths : int [1:23959] 69 1 13 37 11 4 20 19 5 3 ... .. .. .. .. .. .. ..@ elementMetadata: NULL .. .. .. .. .. .. ..@ metadata : list() .. .. .. .. ..@ ranges :Formal class 'IRanges' [package "IRanges"] with 6 slots .. .. .. .. .. .. ..@ start : int [1:408550] 51685386 51688548 51688690 23564961 23566119 23573251 23573788 23573903 23576630 23576731 ... .. .. .. .. .. .. ..@ width : int [1:408550] 707 106 185 176 77 537 115 1615 101 92 ... .. .. .. .. .. .. ..@ NAMES : NULL .. .. .. .. .. .. ..@ elementType : chr "ANY" .. .. .. .. .. .. ..@ elementMetadata: NULL .. .. .. .. .. .. ..@ metadata : list() .. .. .. .. ..@ strand :Formal class 'Rle' [package "S4Vectors"] with 4 slots .. .. .. .. .. .. ..@ values : Factor w/ 3 levels "+","-","": 2 1 2 1 2 1 2 1 2 1 ... .. .. .. .. .. .. ..@ lengths : int [1:16545] 70 104 5 3 45 15 24 1 98 14 ... .. .. .. .. .. .. ..@ elementMetadata: NULL .. .. .. .. .. .. ..@ metadata : list() .. .. .. .. ..@ seqinfo :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots .. .. .. .. .. .. ..@ seqnames : chr [1:45] "1" "2" "3" "4" ... .. .. .. .. .. .. ..@ seqlengths : int [1:45] NA NA NA NA NA NA NA NA NA NA ... .. .. .. .. .. .. ..@ is_circular: logi [1:45] NA NA NA NA NA NA ... .. .. .. .. .. .. ..@ genome : chr [1:45] NA NA NA NA ... .. .. .. .. ..@ elementMetadata:Formal class 'DataFrame' [package "S4Vectors"] with 6 slots .. .. .. .. .. .. ..@ rownames : NULL .. .. .. .. .. .. ..@ nrows : int 408550 .. .. .. .. .. .. ..@ listData : Named list() .. .. .. .. .. .. ..@ elementType : chr "ANY" .. .. .. .. .. .. ..@ elementMetadata: NULL .. .. .. .. .. .. ..@ metadata : list() .. .. .. .. ..@ elementType : chr "ANY" .. .. .. .. ..@ metadata : list() .. .. ..@ elementMetadata:Formal class 'DataFrame' [package "S4Vectors"] with 6 slots .. .. .. .. ..@ rownames : NULL .. .. .. .. ..@ nrows : int 35215 .. .. .. .. ..@ listData :List of 3 .. .. .. .. .. ..$ gene_coordinates: chr [1:35215] "11:51685386-51688874" "11:23564961-23633639" "11:70235206-70237914" "16:31947050-31948494" ... .. .. .. .. .. ..$ locus_overlap : chr [1:35215] "-" "-" "Gm21988" "-" ... .. .. .. .. .. ..$ symbol : Factor w/ 35207 levels "This is symbol of gene: 1:100006021-100006584:+",..: NA NA NA NA NA NA NA NA NA NA ... .. .. .. .. ..@ elementType : chr "ANY" .. .. .. .. ..@ elementMetadata: NULL .. .. .. .. ..@ metadata : list() .. .. ..@ elementType : chr "GRanges" .. .. ..@ metadata :List of 1 .. .. .. ..$ genomeInfo:List of 15 .. .. .. .. ..$ Db type : chr "TxDb" .. .. .. .. ..$ Supporting package : chr "GenomicFeatures" .. .. .. .. ..$ Data source : chr "Musmusculusmm10.gff3" .. .. .. .. ..$ Organism : chr NA .. .. .. .. ..$ Taxonomy ID : chr NA .. .. .. .. ..$ miRBase build ID : chr NA .. .. .. .. ..$ Genome : chr NA .. .. .. .. ..$ transcript_nrow : chr "134979" .. .. .. .. ..$ exon_nrow : chr "512760" .. .. .. .. ..$ cds_nrow : chr "513936" .. .. .. .. ..$ Db created by : chr "GenomicFeatures package from Bioconductor" .. .. .. .. ..$ Creation time : chr "2019-04-23 13:57:28 +0200 (Tue, 23 Apr 2019)" .. .. .. .. ..$ GenomicFeatures version at creation time: chr "1.34.8" .. .. .. .. ..$ RSQLite version at creation time : chr "2.1.1" .. .. .. .. ..$ DBSCHEMAVERSION : chr "1.2" .. .. ..@ partitioning :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots .. .. .. .. ..@ end : int [1:35215] 3 40 69 70 83 120 131 135 155 174 ... .. .. .. .. ..@ NAMES : chr [1:35215] "0610009B22Rik" "0610010F05Rik" "0610010K14Rik" "0610012G03Rik" ... .. .. .. .. ..@ elementType : chr "ANY" .. .. .. .. ..@ elementMetadata: NULL .. .. .. .. ..@ metadata : list() ..@ bins :Formal class 'GRanges' [package "GenomicRanges"] with 7 slots .. .. ..@ seqnames :Formal class 'Rle' [package "S4Vectors"] with 4 slots .. .. .. .. ..@ values : Factor w/ 45 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ... .. .. .. .. ..@ lengths : int [1:63] 41403 53648 30377 38870 42794 33465 53520 31390 38762 30790 ... .. .. .. .. ..@ elementMetadata: NULL .. .. .. .. ..@ metadata : list() .. .. ..@ ranges :Formal class 'IRanges' [package "IRanges"] with 6 slots .. .. .. .. ..@ start : int [1:691866] 3466688 4497655 4498212 4522905 4523604 4524446 4807788 4807823 4807830 4807892 ... .. .. .. .. ..@ width : int [1:691866] 46717 352 1258 699 842 2292 35 7 66 91 ... .. .. .. .. ..@ NAMES : chr [1:691866] "Gm7357:I001" "Lypla1:I001" "Lypla1:I002" "Gm732:E001" ... .. .. .. .. ..@ elementType : chr "ANY" .. .. .. .. ..@ elementMetadata: NULL .. .. .. .. ..@ metadata : list() .. .. ..@ strand :Formal class 'Rle' [package "S4Vectors"] with 4 slots .. .. .. .. ..@ values : Factor w/ 3 levels "+","-","": 1 2 1 2 1 2 1 2 1 2 ... .. .. .. .. ..@ lengths : int [1:107] 20495 20908 26535 27113 15830 14547 19736 19134 21885 20909 ... .. .. .. .. ..@ elementMetadata: NULL .. .. .. .. ..@ metadata : list() .. .. ..@ seqinfo :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots .. .. .. .. ..@ seqnames : chr [1:45] "1" "2" "3" "4" ... .. .. .. .. ..@ seqlengths : int [1:45] NA NA NA NA NA NA NA NA NA NA ... .. .. .. .. ..@ is_circular: logi [1:45] NA NA NA NA NA NA ... .. .. .. .. ..@ genome : chr [1:45] NA NA NA NA ... .. .. ..@ elementMetadata:Formal class 'DataFrame' [package "S4Vectors"] with 6 slots .. .. .. .. ..@ rownames : NULL .. .. .. .. ..@ nrows : int 691866 .. .. .. .. ..@ listData :List of 8 .. .. .. .. .. ..$ locus : chr [1:691866] "Gm7357" "Lypla1" "Lypla1" "Gm732" ... .. .. .. .. .. ..$ bin : chr [1:691866] "I001" "I001" "I002" "E001" ... .. .. .. .. .. ..$ feature : chr [1:691866] "I" "I" "I" "E" ... .. .. .. .. .. ..$ symbol : Factor w/ 35207 levels "This is symbol of gene: 1:100006021-100006584:+",..: NA NA NA NA NA NA NA NA NA NA ... .. .. .. .. .. ..$ locus_overlap: chr [1:691866] "-" "Gm37988" "Gm37988" "-" ... .. .. .. .. .. ..$ class : chr [1:691866] "-" "-" "-" "external" ... .. .. .. .. .. ..$ event : chr [1:691866] "-" "-" "-" "external" ... .. .. .. .. .. ..$ eventJ : chr [1:691866] "-" "-" "-" "external" ... .. .. .. .. ..@ elementType : chr "ANY" .. .. .. .. ..@ elementMetadata: NULL .. .. .. .. ..@ metadata : list() .. .. ..@ elementType : chr "ANY" .. .. ..@ metadata : list() ..@ junctions:Formal class 'GRanges' [package "GenomicRanges"] with 7 slots .. .. ..@ seqnames :Formal class 'Rle' [package "S4Vectors"] with 4 slots .. .. .. .. ..@ values : Factor w/ 45 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ... .. .. .. .. ..@ lengths : int [1:38] 17948 22829 12829 16707 18087 14145 22160 13642 16619 13355 ... .. .. .. .. ..@ elementMetadata: NULL .. .. .. .. ..@ metadata : list() .. .. ..@ ranges :Formal class 'IRanges' [package "IRanges"] with 6 slots .. .. .. .. ..@ start : int [1:281210] 3466687 4497654 4498211 4523603 4807982 4807982 4808365 4808486 4828649 4828649 ... .. .. .. .. ..@ width : int [1:281210] 46719 354 1260 844 474 20603 91 20099 1620 3663 ... .. .. .. .. ..@ NAMES : chr [1:281210] "Gm7357:J001" "Lypla1:J001" "Lypla1:J002" "Lypla1:J003" ... .. .. .. .. ..@ elementType : chr "ANY" .. .. .. .. ..@ elementMetadata: NULL .. .. .. .. ..@ metadata : list() .. .. ..@ strand :Formal class 'Rle' [package "S4Vectors"] with 4 slots .. .. .. .. ..@ values : Factor w/ 3 levels "+","-","*": 1 2 1 2 1 2 1 2 1 2 ... .. .. .. .. ..@ lengths : int [1:62] 8924 9024 11258 11571 6641 6188 8498 8209 9275 8812 ... .. .. .. .. ..@ elementMetadata: NULL .. .. .. .. ..@ metadata : list() .. .. ..@ seqinfo :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots .. .. .. .. ..@ seqnames : chr [1:45] "1" "2" "3" "4" ... .. .. .. .. ..@ seqlengths : int [1:45] NA NA NA NA NA NA NA NA NA NA ... .. .. .. .. ..@ is_circular: logi [1:45] NA NA NA NA NA NA ... .. .. .. .. ..@ genome : chr [1:45] NA NA NA NA ... .. .. ..@ elementMetadata:Formal class 'DataFrame' [package "S4Vectors"] with 6 slots .. .. .. .. ..@ rownames : NULL .. .. .. .. ..@ nrows : int 281210 .. .. .. .. ..@ listData :List of 2 .. .. .. .. .. ..$ locus : chr [1:281210] "Gm7357" "Lypla1" "Lypla1" "Lypla1" ... .. .. .. .. .. ..$ locus_overlap: chr [1:281210] "-" "Gm37988" "Gm37988" "Gm37988" ... .. .. .. .. ..@ elementType : chr "ANY" .. .. .. .. ..@ elementMetadata: NULL .. .. .. .. ..@ metadata : list() .. .. ..@ elementType : chr "ANY" .. .. ..@ metadata : list()

Le mer. 24 avr. 2019 à 14:54, David Benacom david.benacom@gmail.com a écrit :

Le mar. 23 avr. 2019 à 16:50, David Benacom david.benacom@gmail.com a écrit :

Hi back, Still the same problem. I renamed all the bam files & gff. bamFiles <- c( "OTX2P30/30het1.bam", "OTX2P30/30het2.bam", "OTX2P30/30hom1.bam", "OTX2P30/30hom2.bam" ) targets <- data.frame( row.names = c("30het1","30het2", "30hom1", "30hom2"),bam = bamFiles,genotype = c("HET","HET", "HOM", "HOM") ,stringsAsFactors = FALSE )

targets bam genotype 30het1 OTX2P30/30het1.bam HET 30het2 OTX2P30/30het2.bam HET 30hom1 OTX2P30/30hom1.bam HOM 30hom2 OTX2P30/30hom2.bam HOM

Note that this is not functionnal since it goes like this in the aspli

rownames(targets) [1] "30het1" "30het2" "30hom1" "30hom2" as <- AsDiscover( counts, targets, features, bam, readLength=51L,threshold = 5) Error in data.frame(matrix(unlist(strsplit(jnames, "[.]")), byrow = TRUE, : row names supplied are of the wrong length De plus : Warning message: In matrix(unlist(strsplit(jnames, "[.]")), byrow = TRUE, ncol = 3) : la longueur des données [580660] n'est pas un diviseur ni un multiple du nombre de lignes [193554]

So I renamed the files, according to same number of letter than genotype +1 targets <- data.frame( row.names = c("het1","het2", "hom1", "hom2"),bam = bamFiles,genotype = c("HET","HET", "HOM", "HOM") ,stringsAsFactors = FALSE )

targets bam genotype het1 OTX2P30/30het1.bam HET het2 OTX2P30/30het2.bam HET hom1 OTX2P30/30hom1.bam HOM hom2 OTX2P30/30hom2.bam HOM

This goes back to the original problem

rownames(targets) [1] "het1" "het2" "hom1" "hom2" as <- AsDiscover( counts, targets, features, bam, readLength=51L,threshold = 5) Error in [.data.frame(aDataframe, , match(row.names(targets), colnames(aDataframe))) : undefined columns selected

Also, in write I have this error :

writeRds(counts=counts, output.dir = "reads") Error in file(file, ifelse(append, "a", "w")) : impossible d'ouvrir la connexion De plus : Warning message: In file(file, ifelse(append, "a", "w")) : impossible d'ouvrir le fichier 'reads/genes/gene.rd.tab' : No such file or directory writeRds(counts=counts, output.dir = "reads") Error in file(file, ifelse(append, "a", "w")) : impossible d'ouvrir la connexion De plus : Warning message: In file(file, ifelse(append, "a", "w")) : impossible d'ouvrir le fichier 'reads/genes/gene.rd.tab' : No such file or directory

I don't know if it is related. Also here is my SessionInfo() R version 3.5.3 (2019-03-11) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.4

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale: [1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] matrixStats_0.54.0 GenomicFeatures_1.34.8 AnnotationDbi_1.44.0 Biobase_2.42.0 GenomicRanges_1.34.0 GenomeInfoDb_1.18.2 [7] IRanges_2.16.0 S4Vectors_0.20.1 BiocGenerics_0.28.0 ASpli_1.8.1 edgeR_3.24.3 limma_3.38.3

loaded via a namespace (and not attached): [1] ProtGenerics_1.14.0 bitops_1.0-6 bit64_0.9-7 RColorBrewer_1.1-2 progress_1.2.0 [6] httr_1.4.0 tools_3.5.3 backports_1.1.4 R6_2.4.0 rpart_4.1-15 [11] Hmisc_4.2-0 DBI_1.0.0 lazyeval_0.2.2 Gviz_1.26.5 colorspace_1.4-1 [16] nnet_7.3-12 gridExtra_2.3 prettyunits_1.0.2 curl_3.3 bit_1.1-14 [21] compiler_3.5.3 htmlTable_1.13.1 DelayedArray_0.8.0 rtracklayer_1.42.2 scales_1.0.0 [26] checkmate_1.9.1 stringr_1.4.0 digest_0.6.18 Rsamtools_1.34.1 foreign_0.8-71 [31] rmarkdown_1.12 XVector_0.22.0 base64enc_0.1-3 dichromat_2.0-0 pkgconfig_2.0.2 [36] htmltools_0.3.6 ensembldb_2.6.8 BSgenome_1.50.0 htmlwidgets_1.3 rlang_0.3.4 [41] rstudioapi_0.10 RSQLite_2.1.1 BiocParallel_1.16.6 acepack_1.4.1 VariantAnnotation_1.28.13 [46] RCurl_1.95-4.12 magrittr_1.5 GenomeInfoDbData_1.2.0 Formula_1.2-3 Matrix_1.2-17 [51] Rcpp_1.0.1 munsell_0.5.0 stringi_1.4.3 yaml_2.2.0 SummarizedExperiment_1.12.0 [56] zlibbioc_1.28.0 plyr_1.8.4 grid_3.5.3 blob_1.1.1 crayon_1.3.4 [61] lattice_0.20-38 Biostrings_2.50.2 splines_3.5.3 hms_0.4.2 locfit_1.5-9.1 [66] knitr_1.22 pillar_1.3.1 biomaRt_2.38.0 XML_3.98-1.19 evaluate_0.13 [71] biovizBase_1.30.1 latticeExtra_0.6-28 data.table_1.12.2 BiocManager_1.30.4 gtable_0.3.0 [76] assertthat_0.2.1 ggplot2_3.1.1 xfun_0.6 AnnotationFilter_1.6.0 survival_2.44-1.1 [81] tibble_2.1.1 GenomicAlignments_1.18.1 memoise_1.1.0 cluster_2.0.8 BiocStyle_2.10.0

Besides that, the "counts" part seems to work fine since I have differential analysis results that match the one I already have. Question, is there normalization included in ASpli ? I hope you can find something useful, David

Le mar. 23 avr. 2019 à 11:33, Estefania Mancini notifications@github.com a écrit :

I recommend not to use special characters in sample name as well as in gene names (those in gtf file), like:"-","_",".","/","+" etc...They are problematic in many steps of ASpli. Im sorry for this

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/estepi/ASpli/issues/5#issuecomment-485727532, or mute the thread https://github.com/notifications/unsubscribe-auth/ALKNJ3X47V5Q6NJNGFWFUDTPR3JVTANCNFSM4HG367EQ .

KateK commented 5 years ago

Hi!

rowname(targets) [1] "1_Col0" "2_Col0" "3_Col0" "4_Col0" "1_N661863" "2_N661863" "3_N661863" [8] "4_N661863" I have here. Did you mean that here i should't use ??? In your example I saw _ and everithig worked...

CriticalPeriod commented 5 years ago

I aligned the reads using STAR with gff3 file, you think it can be a problem ?

Le mer. 24 avr. 2019 à 15:08, KateK notifications@github.com a écrit :

Hi!

rowname(targets) [1] "1_Col0" "2_Col0" "3_Col0" "4_Col0" "1_N661863" "2_N661863" "3_N661863" [8] "4N661863" I have "" here. Did you mean that here i should't use "" ??? In your example I saw "" and everithig worked...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/estepi/ASpli/issues/5#issuecomment-486221356, or mute the thread https://github.com/notifications/unsubscribe-auth/ALKNJ3RJ4BV6IEQP6RI7NKLPSBLTNANCNFSM4HG367EQ .

KateK commented 5 years ago

Top of my *.gtf file looks like this :

#!genome-build TAIR10
#!genome-version TAIR10
#!genome-date 2010-09
#!genome-build-accession GCA_000001735.1
#!genebuild-last-updated 2010-09
1   araport11   gene    3631    5899    .   +   .   gene_id "AT1G01010"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding";
1   araport11   transcript  3631    5899    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding";
1   araport11   exon    3631    3913    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "1"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; exon_id "AT1G01010.1.exon1";
1   araport11   five_prime_utr  3631    3759    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding";
1   araport11   CDS 3760    3913    .   +   0   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "1"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; protein_id "AT1G01010.1";
estepi commented 5 years ago

thanks to all of you for your helpful comments Sometimes Chromosome names or gene names are weird, such as Scaffold.1.1.1 or ChrX.Scaffold.X I cannot see in your headers this type of problems. In any case, you can check this names very fast using samtools -H your.bam I keep working on this issue, sorry for the inconvenience :cry:

KateK commented 5 years ago

As far as i know samtools doesn't have -H option... You mean this samtools under ubuntu or within RSamtools package?

KateK commented 5 years ago

Ok, I figured out that you meant samtools view -H . Here is what I got:

@HD VN:1.0  SO:coordinate
@SQ SN:1    LN:30427671
@SQ SN:2    LN:19698289
@SQ SN:3    LN:23459830
@SQ SN:4    LN:18585056
@SQ SN:5    LN:26975502
@SQ SN:Mt   LN:366924
@SQ SN:Pt   LN:154478
@PG ID:hisat2   PN:hisat2   VN:2.1.0    CL:"/opt/exp_soft/local/generic/hisat2/2.1.0/hisat2-align-s --wrapper basic-0 -p 8 ../index_HISAT2/Arabidopsis_thaliana.TAIR10.35 -S 1_Col-0_TLDR.sam -1 ../file_1_sorted.fastq -2 ../file_2_sorted.fastq"
estepi commented 5 years ago

I see there is nothing weird here. thanks! I keep working on this, sorry.

KateK commented 5 years ago

Ok, please let me know how it goes

estepi commented 5 years ago

Hi all! We think your problem is because sample names start with a number. Then, in the dataframe, colnames which start with a number are transformed, so they start with an X. Can you change your sample name (and also rownames(targets)) avoiding they start with a number, Thanks and sorry for the inconvenience! Let me know if you try. We are preparing a really good new release, expected for next bioconductor release.

qicaibiology commented 5 years ago

I use letter to begin with my files but still have the same problems. Has anyone has solutions for this error. Tried different ways to deal but have not work through yet.

estepi commented 5 years ago

Hi @qicaibiology, sorry for the inconvenience. Do your sample names and conditions begin with a letter? Do you have any weird name ir your crhromosomes? Thanks!

qicaibiology commented 5 years ago

Following is how I name my file names: bamFiles<- c("DIV-8_Rep1Aligned.sortedByCoord.out.bam", "DIV-8_Rep2Aligned.sortedByCoord.out.bam", "DIV-8_Rep3Aligned.sortedByCoord.out.bam", "DIV-8_Rep4Aligned.sortedByCoord.out.bam", "DIV0_Rep1Aligned.sortedByCoord.out.bam", "DIV0_Rep2Aligned.sortedByCoord.out.bam", "DIV0_Rep3Aligned.sortedByCoord.out.bam", "DIV7_Rep1Aligned.sortedByCoord.out.bam", "DIV7_Rep2Aligned.sortedByCoord.out.bam", "DIV7_Rep3Aligned.sortedByCoord.out.bam", "DIV7_Rep4Aligned.sortedByCoord.out.bam", "DIV7_Rep5Aligned.sortedByCoord.out.bam")

targets <- data.frame(row.names = c("DIV-8Rep1", "DIV-8Rep2", "DIV-8Rep3", "DIV-8Rep4", "DIV0Rep1", "DIV0Rep2", "DIV0Rep3", "DIV7Rep1", "DIV7Rep2", "DIV7Rep3", "DIV7Rep4", "DIV7Rep5"), bam = bamFiles, treat = c("DIV-8", "DIV-8", "DIV-8", "DIV-8", "DIV0", "DIV0", "DIV0", "DIV7", "DIV7", "DIV7", "DIV7", "DIV7"), stringsAsFactors = FALSE)

Previously I use similar command and it went through. I was guessing maybe it was because the sample number in each condition is different so I even reduced DIV7 into 3 sample but the problem is still there.

Thanks,

estepi commented 5 years ago

Thanks. For the sake of simplicity, I would avoid "-" and "." in sample/file names. For targets, you can simply do: targets$treat<-gsub("-","", targets$treat) rownames(targets)<-gsub("-","", rownames(targets)) For your files, you can use rename command (you can make a ln -s before on them if you want to keep original names) Let me know if there is any (good) news. Can you paste also your chromosome names? Just send me the output of : samtools view -H on any of your bams. Thanks

qicaibiology commented 5 years ago

Thanks for the information Estepi.

I have change the "-" and "_" to letters and it went through.

lucaskbobadilla commented 4 years ago

Anything about this error? I am stuck with that too.

qicaibiology commented 4 years ago

Anything about this error? I am stuck with that too.

see my solutions above your question

lucaskbobadilla commented 4 years ago

@qicaibiology what exactly did you change? the bam file names, the rownames or both?

qicaibiology commented 4 years ago

@qicaibiology what exactly did you change? the bam file names, the rownames or both?

I changed the filename