Xinglab / rmats-turbo

Other
232 stars 55 forks source link

ValueError when run rmats.py #289

Open wangzlgithub opened 1 year ago

wangzlgithub commented 1 year ago

Hi! Thank you for developing this useful tools! I met an error. I have two sample about a gene knocked out in two mice, and no replicates. And I do following command for AS to comparing:

rmats.py \
 --b1 /home2/wangzl/workspace/snorna/WT.txt \
 --b2 /home2/wangzl/workspace/snorna/KO.txt \
 --gtf $gtf_dir/mm10_kg.gtf \
 --od $output_dir \
 -t paired \
 --readLength 50 \
 --tmp $tmp_dir \
 --nthread 4

In WT.txt, I write the path to WT.bam like

/Parastor300s_G30S/wangzl/snorna/batch3/bam/WT.sort.bam

and it's similar to KO.txt.

But I got the following error:

Traceback (most recent call last):
  File "/home2/wangzl/miniconda3/envs/rmats/bin/rmats.py", line 536, in <module>
    main()
  File "/home2/wangzl/miniconda3/envs/rmats/bin/rmats.py", line 507, in main
    run_pipe(args)
  File "rmatspipeline/rmatspipeline.pyx", line 3803, in rmats.rmatspipeline.run_pipe
  File "rmatspipeline/rmatspipeline.pyx", line 3666, in rmats.rmatspipeline.split_sg_files_by_bam
  File "rmatspipeline/rmatspipeline.pyx", line 3674, in rmats.rmatspipeline.split_sg_files_by_bam
ValueError: invalid literal for int() with base 10: '\x01'

Could you explain the possible reason? Should I do alternative splicing for each single sample separately or other feasible way?

EricKutschera commented 1 year ago

This is the line for that error message: https://github.com/Xinglab/rmats-turbo/blob/v4.1.1/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L3674

It's trying to read the .rmats files in the --tmp directory. The 1st line of the .rmats file should have the bam file name(s). The second line should be the read length. The error is saying that it found '\x01' on the second line of some .rmats file

My guess is that somehow your --b1 or --b2 file included that '\x01' character and it ended up getting written to a .rmats file as a bam name. You could try rewriting the --b1 and --b2 files to ensure there are no extra characters. Then if you try the rmats command again with the new files and a new --tmp directory I think it should work