Closed gsk7757 closed 1 year ago
well you need a design file of course, follow the tutorials on either the RNA-Seq or the Biostar Workflow pages.
the design lists the groups and the data see the section: What is the experimental design file?
https://www.biostarhandbook.com/books/rnaseq/understand-the-reads.html
I followed the biostar workflow of downloading the scripts and ran the simulated data and it worked. However, when I changed the design file to have 4 samples based on the counts file I have. It shows up an error with deseq2 as below:
RScript code/deseq2.R
Error in $<-.data.frame
(*tmp*
, condition, value = integer(0)) :
replacement has 0 rows, data has 10
Calls: $<- -> $<-.data.frame
Execution halted
(stats)
you have to show the design and the first few lines of the count,
the two have to match for this work correctly
could you please direct me to the workflow that shows how to edit the Rscript to make changes to the design file and count file?
you don't need to edit the count file, the design file is a simple text file you can edit it with any editor,
the point I was making is that the design file has to match the count file. if the columns are named differently than in the count file then it cannot match the design to the counts
Sorry, my bad. I have named the columns to match the count file. But still i see the same error. Below is my design
sample | condition |
---|---|
C1dedup | Control |
C2dedup | Control |
E1dedup | Experimental |
E2dedup | Experimental |
And this is how my counts file look like
Geneid | C1dedup | C2dedup | E1dedup | E2dedup |
---|---|---|---|---|
CAF26817-1 | 551 | 545 | 323 | 208 |
CAF26818-1 | 219 | 212 | 103 | 68 |
CAF26819-1 | 547 | 397 | 106 | 81 |
please paste the actual file (the complete design and first few lines of the counts ) not a markdown formatted version of it.
Since you are formatting the file by hand, it cannot possibly be the actual file right?
This is my actual counts file
# Program:featureCounts v2.0.6; Command:"featurecounts" "-a" "Bartonella_henselae_str_houston_1_gca_000046705.ASM4670v1.57.gff3" "-g" "Name" "-o" "counts.txt" "C1dedup.bam" "C2dedup.bam" "E1dedup.bam" "E2dedup.bam"
Geneid Chr Start End Strand Length C1dedup C2dedup E1dedup E2dedup
CAF26817-1 Chromosome 1 825 + 825 551 545 323 208
CAF26818-1 Chromosome 837 1433 + 597 219 212 103 68
CAF26819-1 Chromosome 1426 2307 + 882 547 397 106 81
CAF26820-1 Chromosome 2304 2891 + 588 375 354 458 278
CAF26821-1 Chromosome 2896 3603 + 708 143 126 146 119
CAF26822-1 Chromosome 3932 6838 + 2907 1442 1444 600 373
CAF26823-1 Chromosome 6854 7735 - 882 917 861 113 68
CAF26824-1 Chromosome 7827 8600 - 774 22982 26927 3981 2544
CAF26825-1 Chromosome 9215 11056 - 1842 741 598 371 216
CAF26826-1 Chromosome 11591 12082 - 492 2674 2637 664 355
CAF26827-1 Chromosome 12079 12942 - 864 388 314 197 132
CAF26828-1 Chromosome 12930 13862 - 933 483 341 149 114
CAF26829-1 Chromosome 14030 14326 - 297 387 442 282 167
CAF26830-1 Chromosome 14335 14613 - 279 868 974 434 286
CAF26831-1 Chromosome 14805 15455 + 651 4480 4431 776 480
CAF26832-1 Chromosome 15567 16145 + 579 2314 2312 376 228
From this, I made another counts.csv file by just taking columns 1, 7-10. which looks like below
Geneid | C1dedup | C2dedup | E1dedup | E2dedup |
---|---|---|---|---|
CAF26817-1 | 551 | 545 | 323 | 208 |
CAF26818-1 | 219 | 212 | 103 | 68 |
CAF26819-1 | 547 | 397 | 106 | 81 |
CAF26820-1 | 375 | 354 | 458 | 278 |
CAF26821-1 | 143 | 126 | 146 | 119 |
CAF26822-1 | 1442 | 1444 | 600 | 373 |
CAF26823-1 | 917 | 861 | 113 | 68 |
CAF26824-1 | 22982 | 26927 | 3981 | 2544 |
CAF26825-1 | 741 | 598 | 371 | 216 |
CAF26826-1 | 2674 | 2637 | 664 | 355 |
CAF26827-1 | 388 | 314 | 197 | 132 |
CAF26828-1 | 483 | 341 | 149 | 114 |
CAF26829-1 | 387 | 442 | 282 | 167 |
CAF26830-1 | 868 | 974 | 434 | 286 |
CAF26831-1 | 4480 | 4431 | 776 | 480 |
CAF26832-1 | 2314 | 2312 | 376 | 228 |
and I made design file that looks as below
sample | condition |
---|---|
C1dedup | Control |
C2dedup | Control |
E1dedup | Experimental |
E2dedup | Experimental |
I closed the terminal and reopened it and ran deseq2 again, which gave me a new error. RScript code/deseq2.R Warning message: In read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'design.csv' Error in DESeqDataSetFromMatrix(countData = countData, colData = colData, : ncol(countData) == nrow(colData) is not TRUE Calls: DESeqDataSetFromMatrix -> stopifnot Execution halted
v
Again the files that you include as a CSV file do not look like are comma-separated files. The file should look like this
a,b,c
1,2,3
instead, the content you show looks like this
a | b | c |
---|---|---|
1 | 2 | 3 |
do you see the difference? The design files you show do not look like comma-separated files.
Sorry, I was opening them in an Excel sheet and copy-pasting them. When I open the files in the terminal this is how they look. more design.csv
Finally, it worked. I had to rerun codes RScript code/parse_featurecounts.r RScript code/deseq2.r When i ran the codes the first time, somehow the output of the counts.csv file was weird. But it worked the second time.
Thank you so much for the help.
Hello, I have downloaded the scripts according to the computer setup that was mentioned in the RNA-seq by example (sub-book). I have created a new environment "stats" in a new terminal and downloaded below scripts.
Obtain the biostar handbook rnaseq scripts.
curl -O http://data.biostarhandbook.com/books/rnaseq/code.tar.gz
Unpack the code.
tar -xzvf code.tar.gz
Along with the deseq2 and edger packages using mamba mamba install bioconductor-tximport bioconductor-biomart bioconductor-edger bioconductor-deseq2 r-gplots
Followed by: RScript code/parse_featurecounts.r
Running this gave me the following error
Rscript code/combine_transcripts.r [1] "# Tool: Combine transcripts" [1] "# Sample: design.csv" [1] "# Data dir: salmon" Error in file(file, "rt") : cannot open the connection Calls: read.csv -> read.table -> file In addition: Warning message: In file(file, "rt") : cannot open file 'design.csv': No such file or directory Execution halted
same with deseq2
RScript code/deseq2.R Error in file(file, "rt") : cannot open the connection Calls: read.csv -> read.table -> file In addition: Warning message: In file(file, "rt") : cannot open file 'design.csv': No such file or directory Execution halted (stats)
Please help. Thanks.