NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit https://nbisweden.github.io/AGAT/
GNU General Public License v3.0
468 stars 56 forks source link

How to merge two genes with overlapping positions but no overlapping CDS? #468

Open binzhengbin opened 5 months ago

binzhengbin commented 5 months ago

I tried to merge genes with overlapping regions using agat_sp_fix_overlaping_genes.pl but it failed because their CDSs don't overlap, is there any way to fix this? Thanks!

微信图片_20240620193344
Juke34 commented 5 months ago

This script if for a particular purpose and avoid overlap of the end of genes to be exactly at the same position ( a peculiarity for submission to public archive) You should use either the merge or complementscript

binzhengbin commented 5 months ago

I don't quite understand what you mean, can you point me in the right direction? Or do you know of any code that accomplishes this?

Juke34 commented 5 months ago

If you work on 2 files use agat_sp_merge_annotations.pl

If you work on a single file then use the gxf2gxf script but you must first activate the merge option in the agat config file ( agat config —expose)

binzhengbin commented 5 months ago

I'll try your offer and I'll let you know if I hear anything new, thanks!

binzhengbin commented 5 months ago

Following your advice, I used agat_convert_sp_gxf2gxf.pl to merge overlapping genes, and it seemed to do something, but only the genes with overlapping CDS positions were merged, while the genes with overlapping other genes/mRNAs were not merged. 24A5B0557481EB48A8BB22CB2435E6F8

Juke34 commented 5 months ago

What vesion of AGAT do you use? To be able to merge gene without CDS, both gene must be of the same type (level1 must be similar e.g. gene and level2 must be of the same type e.g. mRNA). I keep in mind that they have to be on the same strand.

Juke34 commented 5 months ago

Right if two genes have CDS it is at CDS level that the overlap will be taken into account to decide to merge the gene or not. You would like to merge if it overlap at any level? This is something not implemented but if. houseful I could add it.

binzhengbin commented 5 months ago

Sorry, just saw the message now. It would be best if a gene level merge could be implemented.

binzhengbin commented 5 months ago

Since your agat_sp_fix_overlaping_genes.pl program only merges for CDS that have overlap, it would be a great creation if we could be allowed to autonomously choose the level of merging (e.g., gene, mRNA, exon, and CDS, etc.).

Juke34 commented 1 week ago

agat_sp_fix_overlaping_genes.pl is not made for that purpose.

Either use agat_sp_merge_annotations.pl or activate the merge_loci param agat config --expose --merge_loci and use any script (most appropriate is agat_convert_sp_gxf2gxf.pl).

But for now it merge genes * if both have CDS and CDS overlaps and are on same strand.

I have not implemented the merge if one have CDS and the other one no, or force to check overlap at exon level even if they have CDS. I'm not sure if it is relevant. I'm afraid we merge different locus overlapping in their CDS or different features i.e. a miRNA / ncRNA in the UTR of a gene...