PacificBiosciences / pbbioconda

PacBio Secondary Analysis Tools on Bioconda. Contains list of PacBio packages available via conda.
BSD 3-Clause Clear License
247 stars 43 forks source link

Problem with pigeon prepare #620

Closed JesusMHU closed 9 months ago

JesusMHU commented 9 months ago

Operating system Ubuntu 22.04.2 LTS (GNU/Linux 5.19.0-43-generic x86_64)

Package name pigeon 1.2.0 (commit -v1.2.0)

Conda environment [# packages in environment at /home/administrador/anaconda3/envs/pigeon: #

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge isoseq 4.0.0 h9ee0642_0 bioconda isoseq3 4.0.0 h9ee0642_0 bioconda libgcc-ng 13.2.0 h807b86a_2 conda-forge libgomp 13.2.0 h807b86a_2 conda-forge libstdcxx-ng 13.2.0 h7e041cc_2 conda-forge pbccs 6.4.0 h9ee0642_0 bioconda pbpigeon 1.2.0 h4ac6f70_0 bioconda ]

Describe the bug I'm trying to use pigeon prepare to process the Arabidopsis thaliana annotation file downloaded from arabidopsis.org (it's Araport11 annotation). It says it doesn't have the required format, and I have no clue of how to modify it. Could to please help me with this? I have my Iso-seq data from my experiment and I would have thought there was a genecode version for Arabidopsis, given it is a really used model organism.

Error message pigeon prepare Araport11_GTF_genes_transposons.Oct2023.gtf TAIR10_chr_all.fasta | 20231116 20:18:24.829 | FATAL | pigeon prepare ERROR: GFF/GTF file error, improperly formatted record reason : missing gene_name attribute record : Chr1 Araport11 gene 3631 5899 . + . transcript_id "AT1G01010"; gene_id "AT1G01010"; See format documentation at https://isoseq.how

To Reproduce Link to the Arabidopsis annotation: https://www.arabidopsis.org/download/index-auto.jsp?dir=%2Fdownload_files%2FGenes%2FAraport11_genome_release

Expected behavior To be able to use the pigeon prepare and then sort and classify my isoforms from my experiment

jmattick commented 9 months ago

Hi @JesusMHU, We have documentation for what needs to be changed to make a compatible gtf on this page.