dmarron / virdetect

16 stars 4 forks source link

Fatal Error in Reads Input #15

Open ghost opened 2 years ago

ghost commented 2 years ago

Hi Dr. Marron,

Thank you very much for providing this program. I hope that it is not too late to ask for support.

After running the workflow in command line for human RNAseq data, I receive an error stating that the program is "EXITING because of FATAL ERROR in reads input: short read sequence line: 0" - I have attached the STAR_virus_log.out file for reference.

STAR_virus_Log.out.txt

I would really appreciate any guidance that you could provide. Thank you very much for your time.

Regards,

Mike Clarke (University of Alberta)

dmarron commented 2 years ago

Hi Dr. Clarke,

It looks like this error may be caused by a bad read in your fastq file. Could you run the command grep "J00113:205:HG3HNBBXX:1:2228:21623:8963" on your input fastq file that you are passing into the STAR command that returns that error? I suspect that the read sequence for that read may not be in the correct format based on the log.

Best, David

ghost commented 1 year ago

Thank you David,

I did as you suggested, and looking at the fastq files, they seem to be formatted correctly.

The read in question appears to always be the last read in the file, and I believe that the issue lies somewhere in the conversion that produces the unaligned_1.fastq file. Based on the error message, the quality score values are being truncated, as if the command is not fully finishing before moving onto the next one.

For instance, I get the error: EXITING because of FATAL ERROR in reads input: quality string length is not equal to sequence length @J00113:198:HFM5NBBXX:7:2227:24211:32191 +

SOLUTION: fix your fastq file

Then when I check the unaligned_1.fastq file, the final read appears as: @J00113:198:HFM5NBBXX:7:2227:24211:32191 CAGAAACCAGGTAACCACGCCAGCCAGCACACCGATAGGCAGGTTGATGTAGAAGATCCATGGCCAGGTGAAATTATCAGTAATCCAGCCGCCCAGGATCGGGCCGAGCACGGGGGCGATCACCACGGTCATCGCCCATAACGCCAGCG + AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFJJJJJJ7AFFJFJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJFJJJJJFJJFJJJJJJJJJJJJFJJJFJJJJJJJJJ7AFAF<AJJJJ<<F<FJFJJ<

There are the same number of characters in the sequence and the quality string. If I delete the last read entirely, there are no longer any issues with alignment.

I am running the sh script commands one after another (on seperate command lines), and have tried also tying the two commands together with a '&&'.

Any idea what might be going on?

Thanks again for all your help with this.

Regards,

Mike

dmarron commented 1 year ago

Hi Mike, I'm stumped on this one because the sequence length does appear to the be same length as the quality string for that final read that is causing the problem, so I don't know why it would be causing an error with the alignment. Perhaps you could run the unaligned fastq file through fastqc and see if it works? You could also just delete the read and run through the pipeline without it (which could cause your counts to be off by 1 at worst).

David

ghost commented 1 year ago

Hi David,

Great, thanks for all your help. I did end up deleting the last reads for each file and everything works now.

Mike