alejandrogzi / bed2gff

cool BED-to-GFF3 converter that runs in parallel
MIT License
8 stars 1 forks source link

Questions about isoforms.txt #4

Closed SWei2333 closed 1 month ago

SWei2333 commented 1 month ago

Hi,

When I get a geneAnnotation.bed file from ZOONOMIA, it's difficult for me to obtain the isoforms.txt file for the query species. So, I tried to extract the information from orthologsClassification.tsv (cut -f 3,4). However, I found that the geneAnnotation.bed file contains many transcripts labeled as PG, PM, and L, which are not present in the orthologsClassification.tsv file. Should I filter out these transcripts from the geneAnnotation.bed file, or where can I find the correct isoforms.txt file?

alejandrogzi commented 1 month ago

Hi @SWei2333

Thanks for reaching out! I strongly suggest to look at this project: https://github.com/alejandrogzi/postoga I developed a time ago (it includes bed2gff and bed2gtf).

If the only thing you want to do is to obtain a .gff file, I suggest the following command:

./postoga.py base --togadir /your/toga/dir --outdir /your/out/dir -to gff --skip

If you want to declare threshold for the filtering steps use:

-bc BY_CLASS, --by-class BY_CLASS
                        Filter parameter to only include certain orthology classes (I, PI, UL, M, PM,
                        L, UL)
-br BY_REL, --by-rel BY_REL
                      Filter parameter to only include certain orthology relationships (o2o, o2m,
                      m2m, m2m, o2z)
-th THRESHOLD, --threshold THRESHOLD
                      Filter parameter to preserve orthology scores greater or equal to a given
                      threshold (0.0 - 1.0)

Hope that helps!

Alejandro