biostars / biostar-handbook

Issue tracker for the Biostar Handbook
57 stars 12 forks source link

error setting up computer for RNA-seq #289

Closed gsk7757 closed 1 year ago

gsk7757 commented 1 year ago

Hello, I have downloaded the scripts according to the computer setup that was mentioned in the RNA-seq by example (sub-book). I have created a new environment "stats" in a new terminal and downloaded below scripts.

Obtain the biostar handbook rnaseq scripts.

curl -O http://data.biostarhandbook.com/books/rnaseq/code.tar.gz

Unpack the code.

tar -xzvf code.tar.gz

Along with the deseq2 and edger packages using mamba mamba install bioconductor-tximport bioconductor-biomart bioconductor-edger bioconductor-deseq2 r-gplots

Followed by: RScript code/parse_featurecounts.r

Running this gave me the following error

Rscript code/combine_transcripts.r [1] "# Tool: Combine transcripts" [1] "# Sample: design.csv" [1] "# Data dir: salmon" Error in file(file, "rt") : cannot open the connection Calls: read.csv -> read.table -> file In addition: Warning message: In file(file, "rt") : cannot open file 'design.csv': No such file or directory Execution halted

same with deseq2

RScript code/deseq2.R Error in file(file, "rt") : cannot open the connection Calls: read.csv -> read.table -> file In addition: Warning message: In file(file, "rt") : cannot open file 'design.csv': No such file or directory Execution halted (stats)

Please help. Thanks.

ialbert commented 1 year ago

well you need a design file of course, follow the tutorials on either the RNA-Seq or the Biostar Workflow pages.

the design lists the groups and the data see the section: What is the experimental design file?

https://www.biostarhandbook.com/books/rnaseq/understand-the-reads.html

gsk7757 commented 1 year ago

I followed the biostar workflow of downloading the scripts and ran the simulated data and it worked. However, when I changed the design file to have 4 samples based on the counts file I have. It shows up an error with deseq2 as below: RScript code/deseq2.R Error in $<-.data.frame(*tmp*, condition, value = integer(0)) : replacement has 0 rows, data has 10 Calls: $<- -> $<-.data.frame Execution halted (stats)

ialbert commented 1 year ago

you have to show the design and the first few lines of the count,

the two have to match for this work correctly

gsk7757 commented 1 year ago

could you please direct me to the workflow that shows how to edit the Rscript to make changes to the design file and count file?

ialbert commented 1 year ago

you don't need to edit the count file, the design file is a simple text file you can edit it with any editor,

the point I was making is that the design file has to match the count file. if the columns are named differently than in the count file then it cannot match the design to the counts

gsk7757 commented 1 year ago

Sorry, my bad. I have named the columns to match the count file. But still i see the same error. Below is my design

sample condition
C1dedup Control
C2dedup Control
E1dedup Experimental
E2dedup Experimental

And this is how my counts file look like

Geneid C1dedup C2dedup E1dedup E2dedup
CAF26817-1 551 545 323 208
CAF26818-1 219 212 103 68
CAF26819-1 547 397 106 81
ialbert commented 1 year ago

please paste the actual file (the complete design and first few lines of the counts ) not a markdown formatted version of it.

Since you are formatting the file by hand, it cannot possibly be the actual file right?

gsk7757 commented 1 year ago

This is my actual counts file

# Program:featureCounts v2.0.6; Command:"featurecounts" "-a" "Bartonella_henselae_str_houston_1_gca_000046705.ASM4670v1.57.gff3" "-g" "Name" "-o" "counts.txt" "C1dedup.bam" "C2dedup.bam" "E1dedup.bam" "E2dedup.bam"                                  
Geneid  Chr Start   End Strand  Length  C1dedup C2dedup E1dedup E2dedup
CAF26817-1  Chromosome  1   825 +   825 551 545 323 208
CAF26818-1  Chromosome  837 1433    +   597 219 212 103 68
CAF26819-1  Chromosome  1426    2307    +   882 547 397 106 81
CAF26820-1  Chromosome  2304    2891    +   588 375 354 458 278
CAF26821-1  Chromosome  2896    3603    +   708 143 126 146 119
CAF26822-1  Chromosome  3932    6838    +   2907    1442    1444    600 373
CAF26823-1  Chromosome  6854    7735    -   882 917 861 113 68
CAF26824-1  Chromosome  7827    8600    -   774 22982   26927   3981    2544
CAF26825-1  Chromosome  9215    11056   -   1842    741 598 371 216
CAF26826-1  Chromosome  11591   12082   -   492 2674    2637    664 355
CAF26827-1  Chromosome  12079   12942   -   864 388 314 197 132
CAF26828-1  Chromosome  12930   13862   -   933 483 341 149 114
CAF26829-1  Chromosome  14030   14326   -   297 387 442 282 167
CAF26830-1  Chromosome  14335   14613   -   279 868 974 434 286
CAF26831-1  Chromosome  14805   15455   +   651 4480    4431    776 480
CAF26832-1  Chromosome  15567   16145   +   579 2314    2312    376 228

From this, I made another counts.csv file by just taking columns 1, 7-10. which looks like below

Geneid C1dedup C2dedup E1dedup E2dedup
CAF26817-1 551 545 323 208
CAF26818-1 219 212 103 68
CAF26819-1 547 397 106 81
CAF26820-1 375 354 458 278
CAF26821-1 143 126 146 119
CAF26822-1 1442 1444 600 373
CAF26823-1 917 861 113 68
CAF26824-1 22982 26927 3981 2544
CAF26825-1 741 598 371 216
CAF26826-1 2674 2637 664 355
CAF26827-1 388 314 197 132
CAF26828-1 483 341 149 114
CAF26829-1 387 442 282 167
CAF26830-1 868 974 434 286
CAF26831-1 4480 4431 776 480
CAF26832-1 2314 2312 376 228

and I made design file that looks as below

sample condition
C1dedup Control
C2dedup Control
E1dedup Experimental
E2dedup Experimental

I closed the terminal and reopened it and ran deseq2 again, which gave me a new error. RScript code/deseq2.R Warning message: In read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'design.csv' Error in DESeqDataSetFromMatrix(countData = countData, colData = colData, : ncol(countData) == nrow(colData) is not TRUE Calls: DESeqDataSetFromMatrix -> stopifnot Execution halted

v

ialbert commented 1 year ago

Again the files that you include as a CSV file do not look like are comma-separated files. The file should look like this

a,b,c
1,2,3

instead, the content you show looks like this

a b c
1 2 3

do you see the difference? The design files you show do not look like comma-separated files.

gsk7757 commented 1 year ago

Sorry, I was opening them in an Excel sheet and copy-pasting them. When I open the files in the terminal this is how they look. more design.csv

sample,condition C1dedup,Control C2dedup,Control E1dedup,Experimental E2dedup,Experimental (stats) and counts more counts.csv Geneid,C1dedup,C2dedup,E1dedup,E2dedup CAF26817-1,551,545,323,208 CAF26818-1,219,212,103,68 CAF26819-1,547,397,106,81 CAF26820-1,375,354,458,278 CAF26821-1,143,126,146,119 CAF26822-1,1442,1444,600,373 CAF26823-1,917,861,113,68 CAF26824-1,22982,26927,3981,2544 CAF26825-1,741,598,371,216 CAF26826-1,2674,2637,664,355
gsk7757 commented 1 year ago

Finally, it worked. I had to rerun codes RScript code/parse_featurecounts.r RScript code/deseq2.r When i ran the codes the first time, somehow the output of the counts.csv file was weird. But it worked the second time.

Thank you so much for the help.