RitchieLabIGH / IRFinder

MIT License
13 stars 10 forks source link

Force IRFinder to generate IRFinder-IR-dir.txt #39

Open ZJURenyi opened 9 months ago

ZJURenyi commented 9 months ago

Hi, thank you for the great tool! I have a paired of N/T samples sequenced at the same time. When I run IRFinder (singularity, 2.0.1), T has both non-dir and dir outputs, but N only has non-dir. I can confirm both are stranded RNAseq. Is it possible to for the IRFinder to generate dir output for the N sample?

Here is the irfinder.stdout

 --------------------
|  IRFinder v. 2.0.1 |
 --------------------

[  Thu Jan 18 18:53:45 +08 2024  ] STAR is starting with 24 threads
---
[  Thu Jan 18 19:10:30 +08 2024  ] STAR mapping completed
[  Thu Jan 18 19:10:31 +08 2024  ] Processing the BAM file with IRFinder
---
IRFinder run with options:
 - Output Dir:                  IRFinder/CV023_T
 - Main intron ref.:            /hpctmp/renyi04/reference/IRFinder/GRCh38_110//IRFinder/ref-cover.bed
 - Splice junction ref.:        /hpctmp/renyi04/reference/IRFinder/GRCh38_110//IRFinder/ref-sj.ref
 - Read spans ref.:             /hpctmp/renyi04/reference/IRFinder/GRCh38_110//IRFinder/ref-read-continues.ref
 - Optional ROI ref.:           /hpctmp/renyi04/reference/IRFinder/GRCh38_110//IRFinder/ref-ROI.bed
 - Read type:                   SR
 - AI levels:                   1:1:0.05
 - Input BAM:                   IRFinder/CV023_T/Unsorted.bam

Preparing the reference:
 - Junction count...done.
 - Span points...done.
 - Coverage blocks...done.
 - ROI...done

Processing the BAM
Total reads processed: 90525922
Total nucleotides: 12113689612
Total singles processed: 25445
Total pairs processed: 45250239
Short pairs: 362470
Intersect pairs: 28060598
Long pairs: 16827171
Skipped reads: 0
Error reads: 0
Directionality: Dir evidence:   151566
Directionality: Nondir evidence:        1956
Directionality: Dir evidence known junctions:   135182
Directionality: Nondir evidence known junctions:        1352
Directionality: Dir matches ref:        1
Directionality: Dir opposed to ref:     135181
Directionality: Dir score all (0-10000):        9872
Directionality: Dir score known junctions (0-10000):    9900
RNA-Seq directionality -1/0/+1: -1
---
[  Thu Jan 18 19:20:09 +08 2024  ] IRFinder BAM analysis completed
---
---
[  Thu Jan 18 19:20:09 +08 2024  ] Running CNN validator
---
---
[  Thu Jan 18 19:21:06 +08 2024  ] CNN validator completed
---
---
[  Thu Jan 18 19:21:06 +08 2024  ] Sorting the bam file
---
[  Thu Jan 18 19:27:15 +08 2024  ] Indexing the sorted bam file
---
[  Thu Jan 18 19:27:29 +08 2024  ] IRFinder FastQ completed.
---
 --------------------
|  IRFinder v. 2.0.1 |
 --------------------

[  Thu Jan 18 18:13:10 +08 2024  ] STAR is starting with 24 threads
---
[  Thu Jan 18 18:27:55 +08 2024  ] STAR mapping completed
[  Thu Jan 18 18:27:55 +08 2024  ] Processing the BAM file with IRFinder
---
IRFinder run with options:
 - Output Dir:                  IRFinder/CV023_N
 - Main intron ref.:            /hpctmp/renyi04/reference/IRFinder/GRCh38_110//IRFinder/ref-cover.bed
 - Splice junction ref.:        /hpctmp/renyi04/reference/IRFinder/GRCh38_110//IRFinder/ref-sj.ref
 - Read spans ref.:             /hpctmp/renyi04/reference/IRFinder/GRCh38_110//IRFinder/ref-read-continues.ref
 - Optional ROI ref.:           /hpctmp/renyi04/reference/IRFinder/GRCh38_110//IRFinder/ref-ROI.bed
 - Read type:                   SR
 - AI levels:                   1:1:0.05
 - Input BAM:                   IRFinder/CV023_N/Unsorted.bam

Preparing the reference:
 - Junction count...done.
 - Span points...done.
 - Coverage blocks...done.
 - ROI...done

Processing the BAM
Total reads processed: 93210207
Total nucleotides: 11650074326
Total singles processed: 28354
Total pairs processed: 46590927
Short pairs: 745249
Intersect pairs: 36578567
Long pairs: 9267111
Skipped reads: 0
Error reads: 0
Directionality: Dir evidence:   133587
Directionality: Nondir evidence:        16276
Directionality: Dir evidence known junctions:   123949
Directionality: Nondir evidence known junctions:        14088
Directionality: Dir matches ref:        16
Directionality: Dir opposed to ref:     123933
Directionality: Dir score all (0-10000):        8913
Directionality: Dir score known junctions (0-10000):    8979
RNA-Seq directionality -1/0/+1: 0
---
[  Thu Jan 18 18:35:58 +08 2024  ] IRFinder BAM analysis completed
---
---
[  Thu Jan 18 18:35:58 +08 2024  ] Running CNN validator
---
---
[  Thu Jan 18 18:36:23 +08 2024  ] CNN validator completed
---
---
[  Thu Jan 18 18:36:23 +08 2024  ] Sorting the bam file
---
[  Thu Jan 18 18:40:25 +08 2024  ] Indexing the sorted bam file
---
[  Thu Jan 18 18:40:40 +08 2024  ] IRFinder FastQ completed.
---

If no choice, can I run Diff using a mix of IRFinder-IR-dir.txt and IRFinder-IR-non-dir.txt as the input? [update] Ok, i copied nondir to dir, when I run Diff, it reported error below, so the answer is no.

ERROR! The file IRFinder/CV023_N/IRFinder-IR-dir.txt contains a different number of rows respect to the previous one.

Thank you!

RY

CloXD commented 9 months ago

Hello, Sorry for the late answer. Unfortunately those parameters were hard coded from the first version and we never had the need to modify them till now. Your normal sample doesn't reach the score of 9000 ( hardcoded here ) A quick solution might be to change this threshold ( lines 181 and 183 ) to 8000 and recompile the code locally or create the Docker image locally and run it with Singularity. I apologize for the inconvenience. Cheers, Claudio

ZJURenyi commented 9 months ago

Hello, Sorry for the late answer. Unfortunately those parameters were hard coded from the first version and we never had the need to modify them till now. Your normal sample doesn't reach the score of 9000 ( hardcoded here ) A quick solution might be to change this threshold ( lines 181 and 183 ) to 8000 and recompile the code locally or create the Docker image locally and run it with Singularity. I apologize for the inconvenience. Cheers, Claudio

Hi Claudio, Thank you for the update. I also found the solution from the version 1 github here. https://github.com/williamritchie/IRFinder/issues/154

Here is how I did

# converts container to sandbox
singularity build --sandbox IRFinder_sandbox/ IRFinder

# change the threshold from 9000 to 8000
sed -i 's/9000/8000/g' IRFinder_sandbox/IRFinder/src/irfinder/src/ReadBlock/ReadBlockProcessor.cpp

# converts sandbox to container
singularity build IRFinder_8000 IRFinder_sandbox/

# re-run analysis using new image and the BAM generated previously
singularity exec -e IRFinder_8000 \
    IRFinder BAM \
    -r /hpctmp/renyi04/reference/IRFinder/GRCh38_110/ \
    -d CV023_N_8000 \
    CV023_N/Sorted.bam        # this is the bam generated when I run FASTQ using the original image

However it seems not working. The numbers in irfinder.stdout remain not changed. And I checked the file using

singularity shell IRFinder_8000
Singularity> grep 8000 /IRFinder/src/irfinder/src/ReadBlock/ReadBlockProcessor.cpp

The file was successfully modified.

        if ((dir_same > dir_diff * 100) && (dir_score_known >= 8000)) {
        }else if ((dir_diff > dir_same * 100) && (dir_score_known >= 8000)) {

Do you have any idea if I did something wrong? I'm not familiar with singularity, sorry.

CloXD commented 9 months ago

Hello, You need to recompile the binaries, otherwise the changes are only in the source files. Rather than using the sandbox, you clone the repository, change the source file and can re-run the build of the Dockerfile. Later, you can generate the singularity image from the local registry. Something like this:

git clone https://github.com/RitchieLabIGH/IRFinder.git
cd ./IRFinder
sed -i 's/9000/8000/g' ./src/irfinder/src/ReadBlock/ReadBlockProcessor.cpp
docker build -t ZJURenyi/irfinder8k .
singularity build IRFinder8k.sif docker-daemon://ZJURenyi/irfinder8k

Let me know if it helps Cheers, Claudio

ZJURenyi commented 9 months ago

Hello, You need to recompile the binaries, otherwise the changes are only in the source files. Rather than using the sandbox, you clone the repository, change the source file and can re-run the build of the Dockerfile. Later, you can generate the singularity image from the local registry. Something like this:

git clone https://github.com/RitchieLabIGH/IRFinder.git
cd ./IRFinder
sed -i 's/9000/8000/g' ./src/irfinder/src/ReadBlock/ReadBlockProcessor.cpp
docker build -t ZJURenyi/irfinder8k .
singularity build IRFinder8k.sif docker-daemon://ZJURenyi/irfinder8k

Let me know if it helps Cheers, Claudio

Hi Claudio,

Thank you for your help. I had this error when I run docker build

docker build -t irfinder8k .

[+] Building 2.0s (17/17) FINISHED                                                  docker:default
 => [internal] load .dockerignore                                                             0.0s
 => => transferring context: 2B                                                               0.0s
 => [internal] load build definition from Dockerfile                                          0.0s
 => => transferring dockerfile: 2.22kB                                                        0.0s
 => [internal] load metadata for docker.io/rocker/r-ver:4.1.2                                 0.9s
 => [ 1/12] FROM docker.io/rocker/r-ver:4.1.2@sha256:6ae57bef96a03da4ce94c16d30458166213ec84  0.0s
 => https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h                                0.3s
 => [internal] load build context                                                             0.0s
 => => transferring context: 6.11kB                                                           0.0s
 => CACHED [ 2/12] RUN apt-get update &&     apt-get -y upgrade &&     export DEBIAN_FRONTEN  0.0s
 => CACHED [ 3/12] RUN PIP3 install -U --no-cache-dir numpy pandas      scikit-learn scipy    0.0s
 => CACHED [ 4/12] RUN RSCRIPT -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) { i  0.0s
 => CACHED [ 5/12] RUN mkdir -p /Utils/bin/ &&     cd /Utils/ &&     git clone https://githu  0.0s
 => CACHED [ 6/12] RUN cd /Utils/ && git clone https://github.com/lh3/minimap2 &&  cd minima  0.0s
 => [ 7/12] ADD https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h skipcache          0.1s
 => [ 8/12] COPY ./bin /IRFinder/bin                                                          0.1s
 => [ 9/12] COPY ./REF /IRFinder/REF                                                          0.1s
 => [10/12] COPY ./src /IRFinder/src                                                          0.1s
 => [11/12] COPY ./install.sh /IRFinder/                                                      0.1s
 => ERROR [12/12] RUN    cd /IRFinder/ &&  ./install.sh                                       0.3s
------
 > [12/12] RUN    cd /IRFinder/ &&      ./install.sh:
0.291 Checking dependencies...
0.292 Dependency make not found.
0.292 Dependency bedtools not found.
0.293 Dependency samtools not found.
0.293 Dependency gzip not found.
0.294 Dependency gawk not found.
0.294 Dependency libboost-iostreams-dev not found.
0.295 Dependency zlib1g not found.
------
Dockerfile:60
--------------------
  59 |     COPY ./install.sh /IRFinder/
  60 | >>> RUN    cd /IRFinder/ && \
  61 | >>>      ./install.sh
  62 |
--------------------
ERROR: failed to solve: process "/bin/sh -c cd /IRFinder/ && \t./install.sh" did not complete successfully: exit code: 1

Do I have to install all dependencies to build the image?

Thank you!