Xinglab / rmats-turbo

Other
221 stars 53 forks source link

uniformative error reading GTF #414

Closed rLannes closed 3 months ago

rLannes commented 3 months ago

Hi,

I try to run rmats-turbo but using a gtf, that I heavily modified working on non reference species, that I use in other softwares (like IGV, featureCount) I got the following error:

unable to parse the gtf: dmau_merged.gtf
please check that the --gtf argument is a valid .gtf file that is not compressed
Traceback (most recent call last):
  File "rmats_turbo_v4_3_0/rmats.py", line 979, in <module>
    main()
  File "rmats_turbo_v4_3_0/rmats.py", line 945, in main
    run_pipe(args)
  File "rmatspipeline/rmatspipeline.pyx", line 3979, in rmats.rmatspipeline.run_pipe
  File "rmatspipeline/rmatspipeline.pyx", line 3974, in rmats.rmatspipeline.run_pipe
  File "rmatspipeline/rmatspipeline.pyx", line 124, in rmats.rmatspipeline.parse_gtf
IndexError: string index out of range

This error unhelpful I do not know which line of my gtf rmats-turbo has an issue with, and thus cannot fix it. If rmats-turbo has additional requirements for the gtf format, I did not find them anywhere.

Do you have any way to validate a gtf for rmats-turbo? Or to fix this error?

Best regards, Romain

EricKutschera commented 3 months ago

Here's the line for that error: https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L124

It looks at the first character of each line in the gtf to see if it is a comment line. It calls line.strip() first to remove whitespace on that line. Since the error was IndexError: string index out of range I think your file has a line that is either blank or contains only whitespace. You could filter out those lines with grep -E -v '^[[:space:]]*$' file.gtf > filtered_file.gtf

rLannes commented 3 months ago

Thank you, so much! it is working now