RitchieLabIGH / IRFinder

MIT License
13 stars 10 forks source link

Unknown value '-nan' in IRratio column #29

Open aman-akash opened 1 year ago

aman-akash commented 1 year ago

Hey, I am using IRFinder v2.0.0 singularity to quantify intron retention in some long read datasets. I am using Long mode for this and I am getting lots of '-nan' values in the IRratio column of the result.

Command: IRFinder Long -t 72 -r ../REF/ reads.fastq

The pipeline ran successfully without any errors. the stdout is

IRFinder version: 2.0.0 IRFinder start: Mon Jun 26 16:33:39 CEST 2023 IRFinder runmode: Long IRFinder user@host: s391913 @ wbbi206 IRFinder working dir: /storage/users/s391913/assembly/irfinder/ IRFinder reference: ../REF/ IRFinder file 1: reads.fastq

[ Mon Jun 26 16:33:39 CEST 2023 ] Minimap2 is starting with 72 threads

[ Mon Jun 26 17:09:41 CEST 2023 ] Minimap2 mapping completed

[ Mon Jun 26 17:09:42 CEST 2023 ] Processing the BAM file with IRFinder

IRFinder run with options:

Preparing the reference:

Processing the BAM Total reads processed: 38430285 Total nucleotides: 26089288816 Total singles processed: 38430286 Total pairs processed: 0 Short pairs: 0 Intersect pairs: 0 Long pairs: 0 Skipped reads: 31003570

[ Mon Jun 26 17:18:37 CEST 2023 ] IRFinder BAM analysis completed

[ Mon Jun 26 17:18:38 CEST 2023 ] Sorting the bam file

[ Mon Jun 26 17:21:20 CEST 2023 ] Indexing the sorted bam file

[ Mon Jun 26 17:22:07 CEST 2023 ] IRFinder Long completed.

Can you please help me understand this error and maybe why this is happening?

Thanks and Regards, Aman

CloXD commented 1 year ago

Hello Aman, First of all, thanks for using IRFinder and for your feedback. Could you provide a couple of lines with the -nan values? Cheers, Claudio

aman-akash commented 1 year ago

Yes ofcourse, for example: 1 1321093 1321907 INTS11/ENSG00000127054/clean 0 - 13 0.424469 0 0 0 1 0 1 0 1 87 0 0 -nan LowCover 1 1321093 1323146 INTS11/ENSG00000127054/clean 0 - 355 0.728504 0 0 1 2 0 11 0 11 87 0 0 -nan LowCover 1 1321096 1324580 INTS11/ENSG00000127054/anti-near 0 - 503 0.845354 0 1 4 67 0 67 0 67 87 88 0 -nan LowCover 11 46508370 46541944 AMBRA1/ENSG00000110497/clean 0 - 142 0.940656 0 3 5 12 2 22 2 26 0 3 0 -nan LowCover

hopefully, this helps.

I believe this might be due to the 0 values in intron depth.

Regards, Aman

CloXD commented 1 year ago

Dear Aman, yes, it's when both the intron depth and the splice exact columns are 0. Since in long read mode, the IR ratio is computed over the exact splice, when it's 0 and the Intron depth is 0, it's resulting in a -nan instead of 0 due to a missing check. I'll fix the issue and include it in the next release. In the meanwhile, you can safely replace them with a 0. Thank you again for the feedback. Cheers, Claudio