MatthiasLienhard / isotools

IsoTools is a python module for Long Read Transcriptome Sequencing (LRTS) analysis.
https://isotools.readthedocs.io/en/latest/
MIT License
25 stars 6 forks source link

gff file incompatibility? #9

Open Ruth-hals opened 7 months ago

Ruth-hals commented 7 months ago

Hi Matthias, Thank you for making such a nice tool! I would be interested to use it but I cannot seem to format my gff so that it would be compatible. What version of tabix should I be using?

I'm getting the following error; `annotation_fn=f'sorted_fixed_input.gff3.gz'

create the IsoTools transcriptome object from the reference annotation

isoseq=Transcriptome.from_reference(annotation_fn)`

  0%|                                                                                       | 0.00/15.1M [00:00<?, ?B/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[8], line 3
      1 annotation_fn=f'sorted_fixed_input.gff3.gz'
      2 #create the IsoTools transcriptome object from the reference annotation
----> 3 isoseq=Transcriptome.from_reference(annotation_fn)

File ~/anaconda3/envs/metacell/lib/python3.11/site-packages/isotools/transcriptome.py:55, in Transcriptome.from_reference(cls, reference_file, file_format, **kwargs)
     53 tr = cls()
     54 tr.chimeric = {}
---> 55 tr.data = import_ref_transcripts(reference_file, tr, file_format,  **kwargs)
     56 tr.infos = {'reference_file': reference_file, 'isotools_version': __version__}
     57 tr.filter = {'gene': DEFAULT_GENE_FILTER.copy(),
     58              'transcript': DEFAULT_TRANSCRIPT_FILTER.copy(),
     59              'reference': DEFAULT_REF_TRANSCRIPT_FILTER.copy()}

File ~/anaconda3/envs/metacell/lib/python3.11/site-packages/isotools/_transcriptome_io.py:1064, in import_ref_transcripts(fn, transcriptome, file_format, chromosomes, gene_categories, short_exon_th, **kwargs)
   1062     exons, transcripts, gene_infos, cds_start, cds_stop, skipped = _read_gtf_file(fn, chromosomes, **kwargs)
   1063 else:  # gff/gff3
-> 1064     exons, transcripts, gene_infos, cds_start, cds_stop, skipped = _read_gff_file(fn, chromosomes, **kwargs)
   1066 if skipped:
   1067     logger.info('skipped the following categories: %s', skipped)

File ~/anaconda3/envs/metacell/lib/python3.11/site-packages/isotools/_transcriptome_io.py:1012, in _read_gff_file(file_name, chromosomes, progress_bar)
   1010 with tqdm(total=path.getsize(file_name), unit_scale=True, unit='B', unit_divisor=1024, disable=not progress_bar) as pbar, TabixFile(file_name) as gff:
   1011     chrom_ids = get_gff_chrom_dict(gff, chromosomes)
-> 1012     for line in gff.fetch():
   1013         file_pos = gff.tell() >> 16  # the lower 16 bit are the position within the zipped block
   1014         if pbar.n < file_pos:

File ~/anaconda3/envs/metacell/lib/python3.11/site-packages/pysam/libctabix.pyx:499, in pysam.libctabix.TabixFile.fetch()

ValueError: could not create iterator, possible tabix version mismatch

Thank you very much for your help, Best, Ruth

Ruth-hals commented 7 months ago

Hi, A more recent htslib version (HTSlib/1.17-GCC-12.2.0) solved my issue.

Thanks, Best, Ruth

MatthiasLienhard commented 7 months ago

Hi, thank you for reporting. I will leave this open until I fixed the version for the dependencies.