bioinfo-biols / CIRIquant

circular RNA quantification tools
https://sourceforge.net/projects/ciri/files/CIRIquant
MIT License
27 stars 18 forks source link

duplicate rows R error in CIRI_DE_replicate #25

Closed eltonjrv closed 3 years ago

eltonjrv commented 3 years ago

Hi Jyniang, The program is now working smoothly under python 2.7. Thanks again! I'm just encountering an R error (duplicated rows) in the very final step (CIRI_DE_replicate): ################################################### [Fri 2021-03-12 17:26:31] [INFO ] Library information: /nobackup/fbsev/LeedsOmics/Kathy-circRNAs/CIRIquant-run/03-CIRI_DE_replicate/NvsC/NvsC-library_info.csv [Fri 2021-03-12 17:26:31] [INFO ] circRNA expression matrix: /nobackup/fbsev/LeedsOmics/Kathy-circRNAs/CIRIquant-run/03-CIRI_DE_replicate/NvsC/NvsC-circRNA_bsj.csv [Fri 2021-03-12 17:26:31] [INFO ] gene expression matrix: /nobackup/fbsev/LeedsOmics/Kathy-circRNAs/CIRIquant-run/03-CIRI_DE_replicate/NvsC/gene_count_matrix.csv [Fri 2021-03-12 17:26:31] [INFO ] Output DE results: /nobackup/fbsev/LeedsOmics/Kathy-circRNAs/CIRIquant-run/03-CIRI_DE_replicate/NvsC/NvsC-circRNA_de.tsv Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed Calls: read.csv -> read.table In addition: Warning message: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted string Execution halted [Fri 2021-03-12 17:26:31] [INFO ] Finished! ###################################################

The weird thing is that I couldn't find any duplicated row in both circRNA_bsj.csv and gene_count_matrix.csv files. Any clue on how to solve this?

Thanks very much, Best, Elton

Kevinzjy commented 3 years ago

Could you check the first columns in NvsC-library_info.csv as well? I guess that there are some duplicate sample names.

eltonjrv commented 3 years ago

Thanks for your prompt attention, Kevin. Ok, there was also no duplicates in my library_info.csv, BUT there was an incomplete row circRNA description row. The first one right after sample descriptions (see below):

################################ $ head NvsC-library_info.csv Sample,Total,Mapped,Circular,Group CASE1,73927460,71797170,37734,T CASE2,83042976,80349502,26676,T CASE3,78334774,75989168,30354,T CONTROL1,82382804,80406804,63676,C CONTROL2,86148002,83484844,18798,C CONTROL3,84563972,80765346,14990,C NSMUSG00000019966","Kitl","protein_coding" "10:100457048|100478022","exon","ENSMUSG00000036676","Tmtc3","protein_coding" "10:100466021|100478022","exon","ENSMUSG00000036676","Tmtc3","protein_coding" ################################

Same thing has also happened in a different sample pairwise comparison I'm running (see below). I wonder why this first circRNA description row is being generated like this.

################################ $ head ../NvsR/NvsR-library_info.csv Sample,Total,Mapped,Circular,Group CASE1,64677878,62825972,54918,T CASE2,77171814,74682858,38254,T CASE3,66121954,64099836,14564,T CONTROL1,82382804,80406804,63676,C CONTROL2,86148002,83484844,18798,C CONTROL3,84563972,80765346,14990,C NSMUSG00000019971","Cep290","protein_coding" "10:100518795|100528660","exon","ENSMUSG00000019971","Cep290","protein_coding" "10:100528549|100536917","exon","ENSMUSG00000019971","Cep290","protein_coding" ################################

On an attempt to move on, after deleting this problematic row, I'm getting a new R error message (see below):

################################ [Sat 2021-03-13 10:52:16] [INFO ] Library information: /nobackup/fbsev/LeedsOmics/Kathy-circRNAs/CIRIquant-run/03-CIRI_DE_replicate/NvsC/NvsC-library_info.csv [Sat 2021-03-13 10:52:16] [INFO ] circRNA expression matrix: /nobackup/fbsev/LeedsOmics/Kathy-circRNAs/CIRIquant-run/03-CIRI_DE_replicate/NvsC/NvsC-circRNA_bsj.csv [Sat 2021-03-13 10:52:16] [INFO ] gene expression matrix: /nobackup/fbsev/LeedsOmics/Kathy-circRNAs/CIRIquant-run/03-CIRI_DE_replicate/NvsC/gene_count_matrix.csv [Sat 2021-03-13 10:52:16] [INFO ] Output DE results: /nobackup/fbsev/LeedsOmics/Kathy-circRNAs/CIRIquant-run/03-CIRI_DE_replicate/NvsC/NvsC-circRNA_de.tsv Error in [.data.frame(gene_mtx, , rownames(lib_mtx)) : undefined columns selected Calls: [ -> [.data.frame Execution halted [Sat 2021-03-13 10:52:20] [INFO ] Finished! ################################

Could you please shed a light on these issues? Thanks very much, Best, Elton

Kevinzjy commented 3 years ago

Hi @eltonjrv, it seems that lib_info and circ_info are somehow mixed in the same file. Are you using the latest version of CIRIquant (v1.1.2)? If not, I would recommend you do a clean install of CIRIquant under virtualenv, and re-run CIRIquant to see if the problem persists.

Besides, it would be better if you can also provide your commands running CIRIquant and prep_CIRIquant.

eltonjrv commented 3 years ago

Sorry, my bad! There was a typo in my prep_CIRIquant command, which was assigning "library_info.csv" output name for both --lib and --circ options. CIRI_DE_replicate has now worked just fine! Thanks very much again for all your prompt attention and support. Best, Elton

Kevinzjy commented 3 years ago

No worries! :)