bioinfo-biols / CIRIquant

circular RNA quantification tools
https://sourceforge.net/projects/ciri/files/CIRIquant
MIT License
27 stars 18 forks source link

DE with Biological Replicates Step 3 Error #12

Closed mlvanhorn closed 4 years ago

mlvanhorn commented 4 years ago

Hi,

I've been working on a set of samples using the differential expression analysis for biological replicates. All steps work fine until I get to using CIRI_DE_replicate, where I get the following error:

[Wed 2020-05-20 17:01:42] [INFO ] Library information: /pylon5/mc5frap/mvanhorn/CD11b_library_info.csv
[Wed 2020-05-20 17:01:42] [INFO ] circRNA expression matrix: /pylon5/mc5frap/mvanhorn/CD11b_circRNA_bsj.csv
[Wed 2020-05-20 17:01:42] [INFO ] gene expression matrix: /pylon5/mc5frap/mvanhorn/gene_count_matrix.csv
[Wed 2020-05-20 17:01:42] [INFO ] Output DE results: /pylon5/mc5frap/mvanhorn/CD11b_circRNA_de.tsv
Error in `[.data.frame`(gene_mtx, , rownames(lib_mtx)) : 
  undefined columns selected
Calls: [ -> [.data.frame
Execution halted
[Wed 2020-05-20 17:01:44] [INFO ] Finished!

There is no .tsv file produced after this runs. I have checked the .csv input files and there don't seem to be any issues with them. I am running the pipeline on a supercomputer, not my local machine, but have made sure that the R environment module is loaded and includes edgeR. Any suggestions that you have on what might be causing this error and how to fix it would be greatly appreciated!

Kevinzjy commented 4 years ago

Hi mlvanhorn, could you check the first column of input sample list for prepDE.py and prep_CIRIquant, they should be the same set of sample names, otherwise CIRI_DE_replicate won't run properly.

mlvanhorn commented 4 years ago

Hi kevinzjy,

The first column of the input sample list for prepDE.py and prep_CIRIquant are the same sample names.

[mvanhorn@r002 mvanhorn]$ cat PRJNA541279_CD11b_DE.txt
419 ./PRJNA541279/SRR9017419/419.gtf C
420 ./PRJNA541279/SRR9017420/420.gtf C
421 ./PRJNA541279/SRR9017421/421.gtf T
422 ./PRJNA541279/SRR9017422/422.gtf T
423 ./PRJNA541279/SRR9017423/423.gtf T
424 ./PRJNA541279/SRR9017424/424.gtf T
[mvanhorn@r002 mvanhorn]$ cat PRJNA541279_CD11b_gene.txt
419 ./PRJNA541279/SRR9017419/gene/419_out.gtf 
420 ./PRJNA541279/SRR9017420/gene/420_out.gtf 
421 ./PRJNA541279/SRR9017421/gene/421_out.gtf
422 ./PRJNA541279/SRR9017422/gene/422_out.gtf
423 ./PRJNA541279/SRR9017423/gene/423_out.gtf 
424 ./PRJNA541279/SRR9017424/gene/424_out.gtf 
Kevinzjy commented 4 years ago

It's weird, could you also show me the first few lines in /pylon5/mc5frap/mvanhorn/CD11b_library_info.csv and /pylon5/mc5frap/mvanhorn/gene_count_matrix.csv?

mlvanhorn commented 4 years ago

Of course, here's CD11b_library_info.csv:

[mvanhorn@r002 mvanhorn]$ more CD11b_library_info.csv
Sample,Total,Mapped,Circular,Group
419,12529872,130220,4,C
420,15126504,116130,6,C
421,10742310,78616,12,T
422,13931360,87664,6,T
423,24737676,281410,8,T
424,23947304,212768,4,T

And here's the beginning of gene_count_matrix.csv:

[mvanhorn@r002 mvanhorn]$ more gene_count_matrix.csv
gene_id,419,420,421,422,423,424
LOC100506869,0,0,0,0,0,0
LOC102724604,0,0,0,0,0,0
MTVR2,0,1,0,3,1,1
CHMP1B,339,924,236,302,541,507
LINC01634,0,0,0,0,0,0
LINC01637,0,0,0,0,5,0
LOC441204,0,0,0,0,0,0
TCOF1,23,89,42,37,75,71
Kevinzjy commented 4 years ago

It seems that you're using the wrong type of sequencing data, according to https://www.ncbi.nlm.nih.gov/sra/?term=SRR9017419, these libraries were constructed using small RNA primer to detect miRNAs.

In CD11b_library_info.csv, you can see that all samples have too few reads mapped to genome ('Mapped') and recognized as circular RNAs ('Circular'). It's possible that no circRNAs/genes passed the expression filter, which caused this error.

mlvanhorn commented 4 years ago

That makes sense to me. I double-checked the sequencing process for this data and it looks like they were sequencing for total RNA content using these small primers, but I was getting very small counts for circRNAs when I was running CIRIquant for circRNA quantification. So, I probably wasn't getting enough to pass the expression filter. I'm working with another set of data and getting a lot more circular reads per sample, so I think I will be able to avoid this error when I perform the DE analysis with the new data.

Thank you so much for your help with this issue!

mlvanhorn commented 4 years ago

Hello @Kevinzjy,

I'm sorry to open this issue again, but I was getting the same "Error in [.data.frame(gene_mtx, , rownames(lib_mtx)) : undefined columns selected" error with the other data set (NCBI PRJNA283498) that I was running for differential expression with biological replicates. To troubleshoot and see if the issue was with the data that I had chosen, I took both replicates of the control RNase R RNAseq (CRR060057, CRR060058) and the MBNL1 RNase R RNAseq (CRR060060, CRR060061) samples from your CIRIquant paper to run. I did not encounter the previous error, but I did get a different error:

(ciriq-env) [mvanhorn@r002 mvanhorn]$ CIRI_DE_replicate --lib de_test_library_info.csv \
> --bsj de_test_circRNA_bsj.csv \
> --gene gene_count_matrix.csv \
> --out de_test_circRNA_de.tsv
[Fri 2020-06-12 17:59:34] [INFO ] Library information: /pylon5/mc5frap/mvanhorn/de_test_library_info.csv
[Fri 2020-06-12 17:59:34] [INFO ] circRNA expression matrix: /pylon5/mc5frap/mvanhorn/de_test_circRNA_bsj.csv
[Fri 2020-06-12 17:59:34] [INFO ] gene expression matrix: /pylon5/mc5frap/mvanhorn/gene_count_matrix.csv
[Fri 2020-06-12 17:59:34] [INFO ] Output DE results: /pylon5/mc5frap/mvanhorn/de_test_circRNA_de.tsv
Error in `[.TestResults`(xj, i, , drop = FALSE) : 
  unused argument (drop = FALSE)
Calls: [ -> [.data.frame -> [
Execution halted
[Fri 2020-06-12 17:59:37] [INFO ] Finished!

I tried looking in the code for CIRI_DE.R, but couldn't find any lines in the code that this error is referencing. Do you know what might be causing this? Thank you.

Kevinzjy commented 4 years ago

Hi @mlvanhorn, could you show me the results of this codes in your R environment? I'm using edgeR_3.26.8 and limma_3.40.6.

library(edgeR)
sessionInfo()
mlvanhorn commented 4 years ago

Of course, here it is:

[mvanhorn@r001 mvanhorn]$ R

R version 4.0.0 (2020-04-24) -- "Arbor Day"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

> library(edgeR)
Loading required package: limma
> sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.30.0 limma_3.44.1

loaded via a namespace (and not attached):
[1] compiler_4.0.0  Rcpp_1.0.4.6    grid_4.0.0      locfit_1.5-9.4 
[5] lattice_0.20-41

Is it possible that the difference in versions for R, edgeR, or limma is causing the issue?

Edit: I just tried again using R_3.6.0 with edgeR_3.26.1 and limma_3.40.0 and was able to get CIRI_DE_replicate to complete with your data. I also managed to get the data set I was originally trying to run to work as well. The issue was apparently in the names of the sample IDs for the sample.lst and sample_gene.lst for steps 1&2. Previously, they were all numeric (ex. 714), but I added an alphabetical character to the IDs (ex. a714) and they worked no problem. Thank you so much for all of your help!

Kevinzjy commented 4 years ago

Thanks for the update, I will look into the code to see if it's possible to avoid this problem in the future.

Kevinzjy commented 4 years ago

It seems that the recent version of limma changed behaviour of decideTestsDGE, c089daca4b1cadceb1683b0bc747cf73d8d398c8 should work.