ablab / IsoQuant

Transcript discovery and quantification with long RNA reads (Nanopores and PacBio)
Other
128 stars 11 forks source link

Isoquant seems stuck #197

Open francicco opened 1 month ago

francicco commented 1 month ago

Hi,

I'm running isoquant for two species with a relatively small genome size ~400Mb and around 20M reads (35G of fastq). After 1 hour of run, the analysis does not progress. Since I run this job currently isoquant didn't finish.

You can see it from the log here:

2024-06-02 19:06:58,393 - INFO - Running IsoQuant version 3.4.1
2024-06-02 19:06:58,397 - WARNING - --count_exons option has no effect without gene annotation
2024-06-02 19:06:58,397 - WARNING - --sqanti_output option has no effect without gene annotation
2024-06-02 19:06:58,525 - INFO - Novel unspliced transcripts will not be reported, set --report_novel_unspliced true to discover them
2024-06-02 19:06:58,526 - INFO -  === IsoQuant pipeline started === 
2024-06-02 19:06:58,526 - INFO - gffutils version: 0.10.1
2024-06-02 19:06:58,526 - INFO - pysam version: 0.22.1
2024-06-02 19:06:58,526 - INFO - pyfaidx version: 0.7.0
...
2024-06-02 19:27:55,109 - INFO - Finished processing chromosome Diul2900
2024-06-02 19:33:09,720 - INFO - Finished processing chromosome Diul1000
2024-06-02 19:34:09,856 - INFO - Finished processing chromosome Diul2000
2024-06-02 19:50:42,215 - INFO - Finished processing chromosome Diul3100
2024-06-02 20:01:51,089 - INFO - Finished processing chromosome Diul1Z00

I now stopped it, I'm trying to see if resume it, it makes any difference. Is there anything that I can do?

Cheers F

Sefi196 commented 1 month ago

I think i have the same issue.

Running two samples each about 60GB per fastq.

Its been like this for about 10 hours

2024-06-07 04:41:16,060 - INFO - Finished processing chromosome chr12 2024-06-07 04:47:34,530 - INFO - Finished processing chromosome chr19 2024-06-07 05:56:09,617 - INFO - Finished processing chromosome chr6

andrewprzh commented 1 month ago

@francicco @Sefi196

Depends on how long it is stuck. 10 hours seem to be quite a lot. Could you check if IsoQuant is actually consuming CPU?

Best Andrey

francicco commented 1 month ago

Hi,

I figured out why. I mean I noticed that there was a short scaffold causing the problem. Once I removed it everything went smoothly. So I'm guessing there was a problem in the alignment on it.

F

Sefi196 commented 3 weeks ago

Hi Andrey, I cancelled the run before i could check if isoquant was using any cpu.

I am running the analysis with the analysis set so i don't think i have any extra scaffolds that might be causing the issue @francicco was having.

Other samples using the same .fa file have completed successfully

Do you have any suggestion on how i could trouble shoot this.

Best

Sefi

francicco commented 3 weeks ago

Hi @Sefi196,

it may not be the fasta per se, but the alignment on them. In my case it was easier to exclude a very short scaffold rather than look at the specific alignment. For example with stringtie I had a similar problem where a single read would crash che analysis. In that case I had to go and check the actual read that was giving me the problem.

Cheers F

Sefi196 commented 3 weeks ago

Hi @francicco, Thanks for that info.

Do you have a suggestion as to how to find these problematic alignments. Like i said i am only mapping to the analysis set so I don't have any of these problematic scaffolds to worry about.

any tips would great as my most recent run is stuck on chr11 now for almost a 20 hours 😢

Thanks

francicco commented 3 weeks ago

I'd first subsample the bam (samtools) or exclude one scaffold/chr at the time to see if you can narrow down the problem.

I'm sure @andrewprzh can direct you much better. F

Sefi196 commented 3 weeks ago

Seems like i fixed the issue by removing chrM.

It would be good to understand what the issue here is and quantify genes on chrM.

If you have any suggestions that could be helpful.

andrewprzh commented 3 weeks ago

@francicco @Sefi196

Thanks for sharing that! I've seen this previously, precisely with the chrM containing too many reads in the same region. However, this issue was supposed to be fixed in 3.4.0. So, if anyone can share a subsampled BAM files with problematic reads, that would be wonderful.

@Sefi196 are you using the latest version, by the way?

Best Andrey

Sefi196 commented 3 days ago

Hi Andrey, Im Using the latest version but i still have the same issues i've described earlier.

It seems to complete chrM but afterwards isoquant just stays there with no progression for many hours until the job timed out.

2024-07-04 18:00:36,370 - INFO - Processing chromosome chrM
2024-07-04 18:01:58,571 - INFO - Loaded data for chr17
2024-07-04 18:03:00,141 - INFO - Loaded data for chr1
2024-07-04 18:03:11,744 - INFO - Loaded data for chr19
2024-07-04 18:18:44,004 - INFO - Finished processing chromosome chr21
2024-07-04 23:17:13,329 - INFO - Finished processing chromosome chrM

Is this a memory issue or perhaps due to the fact that i am providing a very large read group file? Would be great to get it to work so if you have any thoughts that could be useful

Thanks again

Sefi