Open mbkodos opened 9 months ago
This is very odd! I can't seem to reproduce it so I hope you can help by doing one of the following using this script (unzip first): filter_collapsed_isoforms_from_annotation.py.gz. This is the script that fails in your run but it should now tell you which transcript is tripping it up.
If you still have the output directory from the failed run:
filter_collapsed_isoforms_from_annotation.py \
-i /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/chr22.isoforms.bed \
-a /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/chr22.annotated_transcripts.supported.bed \
--map_a /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/chr22.annotated_transcripts.isoform.read.map.txt \
--map_i /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/chr22.isoform.read.map.txt \
-o testout.bed --new_map testmap.txt
If not, simply replace the script flair uses in your conda env. For instance, mine is at
/home/jeltje/miniconda3/envs/flair/lib/python3.10/site-packages/flair/filter_collapsed_isoforms_from_annotation.py
and run the flair collapse command again.
Once you know which transcript is problematic please find it in your input bed and post it here.
I ran the script and it fails almost immediately with the same error without any output:
Traceback (most recent call last):
File "filter_collapsed_isoforms_from_annotation.py", line 311, in
I made a few changes, try again? filter_collapsed_isoforms_from_annotation.py.gz
Command: python3 filter_collapsed_isoforms_from_annotation.py -i /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/mytest.isoforms.bed -a /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/mytest.annotated_transcripts.supported.bed --map_a /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/mytest.annotated_transcripts.isoform.read.map.txt --map_i /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/mytest.isoform.read.map.txt -o testout.bed --new_map testmap.txt
Output comparing bc889af3-5458-4e1d-bed4-7d755c69d0f4-0_ENSG00000142676.14 and ENST00000374550.8_ENSG00000142676.14 it seems ENST00000374550.8_ENSG00000142676.14 has no annotated introns even though it's similar to bc889af3-5458-4e1d-bed4-7d755c69d0f4-0_ENSG00000142676.14
Problematic transcript from mytest.isoforms.bed chr1 23692709 23696418 bc889af3-5458-4e1d-bed4-7d755c69d0f4-0_ENSG00000142676.14 60 + 23692709 23696418 27,158,119 5 50,107,132,111,75, 0,1097,1950,3088,3634,
Note: I reran the Flair pipeline starting with align, where previously I had converted my previously aligned read using bam2Bed12. Still crashed as it had previously.
Very odd! You can clearly tell it's wrong, but I don't understand why it does that. I would like to try this myself, would you be able to share your files with me? You can email me at jeltje@soe.ucsc.edu
Is the input to your collapse command a concatenation of flair correct files? In other words, do you run the workflow separately and then combine outputs to run collapse?
I'm asking because many transcripts occur more than once in annotated_transcripts.supported.bed
, for example ENST00000706939.1_ENSG00000137076.22 occurs 57 times, with slightly different coordinates. This makes the bed file invalid.
For this test, I reduced it to just one sample. I took the output from the flair correct: flair correct --nvrna -q /storage/mbauer/nanopore/RNA/cell_lines/ARP1_OENEK2/ARP1_OENEK2_RNA_20231030/ARP1_OENEK2_RNA_20231030_flair.bed -f /home/mbauer/genome/gencode.v43.annotation.sorted.gtf -g /home/mbauer/genome/gencode.43.GRCh38.primary_assembly.genome.fa --output /storage/mbauer/nanopore/RNA/cell_lines/ARP1_OENEK2/ARP1_OENEK2_RNA_20231030/ARP1_OENEK2_RNA_20231030_flair
And used the output ARP1_OENEK2_RNA_20231030_flair_all_corrected.bed in the flair collapse flair collapse -g /home/mbauer/genome/gencode.43.GRCh38.primary_assembly.genome.fa --gtf /home/mbauer/genome/gencode.v43.annotation.sorted.gtf -q /storage/mbauer/nanopore/RNA/cell_lines/ARP1_OENEK2/ARP1_OENEK2_RNA_20231030/ARP1_OENEK2_RNA_20231030_flair_all_corrected.bed -r /storage/mbauer/nanopore/RNA/cell_lines/ARP1_OENEK2/ARP1_OENEK2_RNA_20231030/ARP1_OENEK2_RNA_20231030_combined.fastq.gz --stringent --check_splice --generate_map --annotation_reliant generate --output /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/mytest --threads 6 --temp_dir /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/temp
Would you mind sending me ARP1_OENEK2_RNA_20231030/ARP1_OENEK2_RNA_20231030_flair.bed ?
the exact command you tried to run Feel free to leave any original paths, we don't have access to your system srun -N 1 -n 1 bash -c "flair collapse -g /home/mbauer/genome/gencode.43.GRCh38.primary_assembly.genome.fa --gtf /home/mbauer/genome/gencode.v43.annotation.sorted.gtf -q /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/splitdir/chr22.bed -r /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/ARP1_OENEK2_combined.fq --stringent --check_splice --generate_map --annotation_reliant generate --output /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/chr22 --threads 6 --temp_dir /storage/mbauer/nanopore/RNA/cell_lines/analysis/flair/temp/" &
How did you install Flair? (We'd prefer it if you used one of the top two because they are the least likely to have package compatibility problems.)
conda create -n flair -c conda-forge -c bioconda flair
)What happened?
What else do we need to know? I am running 6 nanopore direct RNA sequencing samples. I combined them into one fa file. I split the corrected bed files by chromosome due to large file size as was suggested in another issue report.