BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
208 stars 71 forks source link

Collapse with range option does not give isoform.fa and isoform.gtf #226

Closed JYLeeBioinfo closed 1 year ago

JYLeeBioinfo commented 1 year ago

What I was doing : I tried collapse with range option refering to "run_flair_collapse_ranges.sh"

It seemed that the outputs are fa,gtf, and bed for individual ranges.

image

What went worng : No matter how I rerun the code, gtf and fa files were not generated.

I reduces the size of the split by adjusting the options in bedPartiion but it didn't worked.

Possible Cause : I found the possibly problematic code segment within the collapse function in flair.py.

As in the capture below, when args.ranges are set, isoforms.gft and isoforms.fa are not generated.

image

Question

  1. Is this behavior what you intended? (Suppressing isoforms.gtf, isoforms.fa output when a range is set)
  2. If so, what is the next step to get isoforms.fa after collapse with range and before quantify? "Quantify" step requires isoforms.fa but collapse with range option do not give it.
Jeltje commented 1 year ago

This is the full statement from your screenshot above. Note the --range at the very end.

# run flair collapse for each independent region
parallel -a $CORRECTED_BED.ranges.bed python ../flair.py collapse -q $CORRECTED_BED.sorted.bed.gz -g $GENOME_FA -r $READS_BAM -f $ANNOTATION_GTF --temp_dir $TEMP_DIR -o $TEMP_DIR/temp --quiet --range

I think that --range should no longer be there, because it refers to an older way of running the code. Could you try removing it from your version of run_flair_collapse_ranges.sh and see what happens?

JYLeeBioinfo commented 1 year ago

Hi @Jeltje

Running without --range gave an error indicating that positional arguments are not expected.

Jeltje commented 1 year ago

Ah right, my bad, I forgot how parallel works.

You are correct that --range doesn't output isoform.fa and isoform.gtf files, I am still trying to find out why the code was written like that. Meanwhile, you can do it yourself by doing something like this if you have a conda install:

python ~/.conda/envs/flair/lib/python3.10/site-packages/flair/psl_to_sequence.py $OUTFILE_NAME.isoforms.bed $GENOME_FA $OUTFILE_NAME.isoforms.fa
python ~/.conda/envs/flair/lib/python3.10/site-packages/flair/psl_to_gtf.py $OUTFILE_NAME.isoforms.bed > $OUTFILE_NAME.isoforms.gtf

or, if you cloned the repo:

flair/src/flair/psl_to_sequence.py $OUTFILE_NAME.isoforms.bed $GENOME_FA $OUTFILE_NAME.isoforms.fa
flair/src/lair/psl_to_gtf.py $OUTFILE_NAME.isoforms.bed > $OUTFILE_NAME.isoforms.gtf

I know it says psl but it does also take bed as input.

I'm leaving this ticket open until we figure out why it works this way.

JYLeeBioinfo commented 1 year ago

Thank you for the suggestion!

It worked and I got the final gtf and fa file

Thank you!