ablab / IsoQuant

Transcript discovery and quantification with long RNA reads (Nanopores and PacBio)
https://ablab.github.io/IsoQuant/
Other
153 stars 13 forks source link

Repeating error mesage: [W::hts_idx_load3] The index file is older than the data file: #244

Open zli-lilly opened 1 month ago

zli-lilly commented 1 month ago

Thank you for developing such a great tool. Just got an error message hope to get your help with. The error message repeatively shows below line seemingly without progressing to the next step. Could you please recommend the best practice of handling this error? Thank you. [W::hts_idx_load3] The index file is older than the data file:

andrewprzh commented 1 month ago

@zli-lilly

Thanks for the feedback!

This message itself is not a problem. It simply means your .bai index has an older modification date than the BAM file itself. It can happen if the files were copied from another location, and the index file was copied first. If you want to get rid of this message, simply rebuild the index files with samtools index.

Best Andrey

zli-lilly commented 1 month ago

Thank you, Andrey. Much appreciated.

zli-lilly commented 1 month ago

Hey Andrey. I ignored the warning messages as you suggested. The pipeline was terminated abruptly after several hours. Attached is the log file. Could you please help me take a look? Your help is greatly apprecaited. isoquant.log

andrewprzh commented 1 month ago

@zli-lilly

The cause of the error is unknown, looks like one the threads was killed, i.e. possibly due to RAM consumption of CPU quotas on the server. There is no failure in IsoQuant itself.

On the other topic, you have other warning about your annotation Gene LOC102142360 has no exons / transcripts, check your input annotation This suggest something is wrong with your GTF, could send me a few examples, e.g. with this particular gene?

zli-lilly commented 1 month ago

Hey Andrey, Below are the CPU core and memory for the batch header of the run. Would you suggest other settings?

SBATCH -c 4

SBATCH --mem 100G # Total size of memory

I also attached the LOC102142360 example which is a peusdo gene from a Cyno monkey assembly. Any input would be super helpful. LOC102142360.txt

andrewprzh commented 1 month ago

Typically 100G should be enough. The error does not tell anything meaningful, could you re-run IsoQuant to see if it reproduces?

Regarding the GTF file, are there any transcripts/exons belonging to this gene?

zli-lilly commented 1 month ago

I just resubmitted the job yesterday before commenting the error. And the new log showed the same error message. About LOC102142360, there are no transcripts/exons available for this gene.

andrewprzh commented 1 month ago

Could you show me how do you submit the job? Could you send the second log as well?

About LOC102142360, there are no transcripts/exons available for this gene.

That's a bit odd, IsoQuant expects genes to have transcripts and exons, otherwise it cannot process them.

zli-lilly commented 1 month ago

I've attached the sh (as txt to meet upload requirement) and log file here. I would simply "sbatch" the sh job to HPC. Nanopore_isoquant.txt isoquant.log The assembly is a Cyno species with lots of peudo genes. Would IsoQuant simply ignore those or I should manually remove them from the GTF?

andrewprzh commented 1 month ago

I see that now the error message occurred in a different moment, so something kills one of the IsoQuant processes. The problem doesn't seem be on IsoQuant side. You may try requesting more RAM or contacting your system administrator, system logs might have some information.

The assembly is a Cyno species with lots of peudo genes. Would IsoQuant simply ignore those or I should manually remove them from the GTF?

Not a problem, they will be simply ignored.