BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
213 stars 71 forks source link

Crash running flair collapse #322

Open mbkodos opened 9 months ago

mbkodos commented 9 months ago

the exact command you tried to run Feel free to leave any original paths, we don't have access to your system srun -N 1 -n 1 bash -c "flair collapse -g /home/mbauer/genome/gencode.43.GRCh38.primary_assembly.genome.fa --gtf /home/mbauer/genome/gencode.v43.annotation.sorted.gtf -q /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/splitdir/chr22.bed -r /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/ARP1_OENEK2_combined.fq --stringent --check_splice --generate_map --annotation_reliant generate --output /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/chr22 --threads 6 --temp_dir /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/temp/" &

How did you install Flair? (We'd prefer it if you used one of the top two because they are the least likely to have package compatibility problems.)

  1. bioconda (e.g. conda create -n flair -c conda-forge -c bioconda flair)

What happened?

Making transcript fasta using annotated gtf and genome sequence
Aligning reads to reference transcripts
Counting supporting reads for annotated transcripts
Setting up unassigned reads for flair-collapse novel isoform detection
Renaming isoforms using gtf
Aligning reads to first-pass isoform reference
Aligning reads to firstpass transcripts
Counting supporting reads for firstpass transcripts
Filtering isoforms by read coverage
Traceback (most recent call last):
  File "/home/mbauer/software/flair/bin/flair", line 86, in <module>
    main()
  File "/home/mbauer/software/flair/bin/flair", line 51, in main
    [isoforms, isoform_sequences] = collapse()
                                    ^^^^^^^^^^
  File "/home/mbauer/software/flair/src/flair/flair_collapse.py", line 423, in collapse
    filter_collapsed_isoforms_from_annotation(support=min_reads,
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mbauer/software/flair/src/flair/filter_collapsed_isoforms_from_annotation.py", line 229, in filter_collapsed_isoforms_from_annotation
    if isoforms[chrom][n]['jname'][1:-1] in isoforms[chrom][n_]['jname'] and \
                                            ~~~~~~~~~~~~~~~~~~~^^^^^^^^^
KeyError: 'jname'
srun: error: xbt049: task 0: Exited with exit code 1

What else do we need to know? I am running 6 nanopore direct RNA sequencing samples. I combined them into one fa file. I split the corrected bed files by chromosome due to large file size as was suggested in another issue report.

Jeltje commented 9 months ago

This is very odd! I can't seem to reproduce it so I hope you can help by doing one of the following using this script (unzip first): filter_collapsed_isoforms_from_annotation.py.gz. This is the script that fails in your run but it should now tell you which transcript is tripping it up.

If you still have the output directory from the failed run:

filter_collapsed_isoforms_from_annotation.py \
-i /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/chr22.isoforms.bed \
-a /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/chr22.annotated_transcripts.supported.bed \
--map_a /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/chr22.annotated_transcripts.isoform.read.map.txt \
--map_i /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/chr22.isoform.read.map.txt \
-o testout.bed --new_map testmap.txt

If not, simply replace the script flair uses in your conda env. For instance, mine is at /home/jeltje/miniconda3/envs/flair/lib/python3.10/site-packages/flair/filter_collapsed_isoforms_from_annotation.py and run the flair collapse command again.

Once you know which transcript is problematic please find it in your input bed and post it here.

mbkodos commented 9 months ago

I ran the script and it fails almost immediately with the same error without any output:

Traceback (most recent call last): File "filter_collapsed_isoforms_from_annotation.py", line 311, in main() File "filter_collapsed_isoforms_from_annotation.py", line 35, in main new_map=args.new_map, isbed=isbed) File "filter_collapsed_isoforms_from_annotation.py", line 228, in filter_collapsed_isoforms_fromannotation if isoforms[chrom][n] and not isoforms[chrom][n_]['jname']: KeyError: 'jname'

Jeltje commented 9 months ago

I made a few changes, try again? filter_collapsed_isoforms_from_annotation.py.gz

mbkodos commented 9 months ago

Command: python3 filter_collapsed_isoforms_from_annotation.py -i /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/mytest.isoforms.bed -a /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/mytest.annotated_transcripts.supported.bed --map_a /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/mytest.annotated_transcripts.isoform.read.map.txt --map_i /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/mytest.isoform.read.map.txt -o testout.bed --new_map testmap.txt

Output comparing bc889af3-5458-4e1d-bed4-7d755c69d0f4-0_ENSG00000142676.14 and ENST00000374550.8_ENSG00000142676.14 it seems ENST00000374550.8_ENSG00000142676.14 has no annotated introns even though it's similar to bc889af3-5458-4e1d-bed4-7d755c69d0f4-0_ENSG00000142676.14

Problematic transcript from mytest.isoforms.bed chr1 23692709 23696418 bc889af3-5458-4e1d-bed4-7d755c69d0f4-0_ENSG00000142676.14 60 + 23692709 23696418 27,158,119 5 50,107,132,111,75, 0,1097,1950,3088,3634,

Note: I reran the Flair pipeline starting with align, where previously I had converted my previously aligned read using bam2Bed12. Still crashed as it had previously.

Jeltje commented 9 months ago

Very odd! You can clearly tell it's wrong, but I don't understand why it does that. I would like to try this myself, would you be able to share your files with me? You can email me at jeltje@soe.ucsc.edu

Jeltje commented 9 months ago

Is the input to your collapse command a concatenation of flair correct files? In other words, do you run the workflow separately and then combine outputs to run collapse?

I'm asking because many transcripts occur more than once in annotated_transcripts.supported.bed, for example ENST00000706939.1_ENSG00000137076.22 occurs 57 times, with slightly different coordinates. This makes the bed file invalid.

mbkodos commented 8 months ago

For this test, I reduced it to just one sample. I took the output from the flair correct: flair correct --nvrna -q /storage/mbauer/nanopore/RNA/cell_lines/ARP1_OENEK2/ARP1_OENEK2_RNA_20231030/ARP1_OENEK2_RNA_20231030_flair.bed -f /home/mbauer/genome/gencode.v43.annotation.sorted.gtf -g /home/mbauer/genome/gencode.43.GRCh38.primary_assembly.genome.fa --output /storage/mbauer/nanopore/RNA/cell_lines/ARP1_OENEK2/ARP1_OENEK2_RNA_20231030/ARP1_OENEK2_RNA_20231030_flair

And used the output ARP1_OENEK2_RNA_20231030_flair_all_corrected.bed in the flair collapse flair collapse -g /home/mbauer/genome/gencode.43.GRCh38.primary_assembly.genome.fa --gtf /home/mbauer/genome/gencode.v43.annotation.sorted.gtf -q /storage/mbauer/nanopore/RNA/cell_lines/ARP1_OENEK2/ARP1_OENEK2_RNA_20231030/ARP1_OENEK2_RNA_20231030_flair_all_corrected.bed -r /storage/mbauer/nanopore/RNA/cell_lines/ARP1_OENEK2/ARP1_OENEK2_RNA_20231030/ARP1_OENEK2_RNA_20231030_combined.fastq.gz --stringent --check_splice --generate_map --annotation_reliant generate --output /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/mytest --threads 6 --temp_dir /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/temp

Jeltje commented 8 months ago

Would you mind sending me ARP1_OENEK2_RNA_20231030/ARP1_OENEK2_RNA_20231030_flair.bed ?