Closed idarolti closed 1 year ago
Hi @idarolti ,
Pigeon is pretty particular with GTF formats so in addition to the kind response from @greensii above you might also want to check the following constraints:
The pigeon GTF format req is below
A tab-delimited 9-column file per ImageGFF/GTF File Format Column 1 must be the chromosome Column 2 is ignored Column 3 will only be processed if it is gene, transcript, or exon. All other types are ignored? Column 4 & 5 are start/end Column 6 & 8 are ignored? Column 7 is the strand which must be + or - (does it give error if it is neither +/-)? Column 9 is attribute, AKA free text string, but to be properly processed it must contain a minimal of the following, separated by semicolon. Ex: gene_id "ENSG0001"; transcript_id "ENST000A"; gene_name "TP53";
Let us know how this goes. Thanks! -Liz
Operating system Linux
Package name pigeon --classify command pigeon 1.0.0 (commit -v1.0.0) Using: pbbam : 2.2.0 (commit v2.2.0-1-g8c081f6) pbcopper : 2.1.0 (commit v2.1.0) boost : 1.77 htslib : 1.14 zlib : 1.2.11
Conda environment For reference, for pbpigeon and isoseq3 I had to download the binaries directly because when installing with conda I kept getting segfaults (similar to #568), hence why they are not in the conda environment list below.
Describe the bug When running the command
pigeon classify <sorted.gff> <annotations.gtf> <reference.fa> --num-threads 12 --log-level INFO
no output is produced despite running for more than 24h.Error message No error message or warnings produced
To Reproduce I have followed this workflow (https://isoseq.how/classification/workflow.html) for classifying isoforms. The reference genome annotation was originally in gff format, so I used AGAT to convert it to gtf. I then sorted and indexed the reference annotation GTF file and sorted the input transcript GFF file (completed with no error messages). However, when running pigeon classify no output is produced.
To test the command on a smaller dataset, I subset the input transcript GFF and reference annotation GTF files to just one gene and ran pigeon classify, which worked without issues. But if I increase the number of genes even by a few then it gets stuck with no output or log messages being produced.
Expected behavior I expect pigeon output files to be produced or at least an error message to be printed