I would like to give a try to fusion-bloom tools to search for fusions in human RNAseq data in a clinical setting.
I encountered an issue in the last step (pavfinder fusion) :
pavfinder 1.6
Traceback (most recent call last):
File "/home/anthony/sw/miniconda3/envs/fusion-bloom-env/bin/find_sv_transcriptome.py", line 260, in <module>
main()
File "/home/anthony/sw/miniconda3/envs/fusion-bloom-env/bin/find_sv_transcriptome.py", line 208, in main
only_fusions=args.only_fusions
File "/home/anthony/sw/miniconda3/envs/fusion-bloom-env/lib/python2.7/site-packages/pavfinder/transcriptome/sv_finder.py", line 267, in find_events
block_matches = self.exon_mapper.map_align(aligns[0])
File "/home/anthony/sw/miniconda3/envs/fusion-bloom-env/lib/python2.7/site-packages/pavfinder/transcriptome/exon_mapper.py", line 256, in map_align
if not self.transcripts_dict.has_key(record.transcript_id):
File "pysam/libctabixproxies.pyx", line 635, in pysam.libctabixproxies.GTFProxy.__getattr__
KeyError: 'transcript_id'
I am using GTF annotation file from GENCODE and indeed there are some (gene) lines without a defined transcript_id field.
Looking at the GTF format description, transcript_id field must be present in every GTF record though.
Reading about what a proper GTF should be and the code from pavfinder/transcriptome/transcript.py where only exon and CDS features are loaded in the object, I have filtered the GTF file and it works fine now.
Hi,
I would like to give a try to
fusion-bloom
tools to search for fusions in human RNAseq data in a clinical setting.I encountered an issue in the last step (
pavfinder fusion
) :I am using GTF annotation file from GENCODE and indeed there are some (
gene
) lines without a definedtranscript_id
field.Looking at the GTF format description,
transcript_id
field must be present in every GTF record though.What would you suggest ?
transcript_id
in GTF filetranscript_id
field in all GTF linesThanks, Anthony