Open chatla01 opened 3 years ago
Hi,
Just got your email, sorry for not noticing this before. This is a new error. Can you send me the following two files:
Thanks
I am surprised this issue has never been hit before, to be honest. The issue here is that the synteny classifier is expecting that the gene_id identifier of a gene is unique. This is so that it can compare synteny between the source genome and the target genome. This is breaking for your tRNAs, whose gene names are not unique. I could probably bypass this here, but I am not sure how things downstream will behave if there are shared gene identifiers like this. It is probably best to fix the source file to have gene_id values that are unique to individual loci. I am going to modify the validate_gff3
script to throw an error in these cases.
Can you send me your input GFF3 file so I can verify that my hypothesis is correct, and that the changes to the validator script work? I will also send you back a fixed version of the input GFF3.
Hello,
I just made a new branch fix/disjoint_chrom
in PR #252. This branch contains a requirement that genes be on the same chromosome, which will fix the crash you ran in to. There is also a new fixer script to fix gff3 files in programs/fix_chrom_disjoint_genes
. This script will produce GFF3 that are valid.
What this script cannot do is fix genes that are disjoint on the same chromosome. This is due to a limitation of the genePred file format. In your annotation files, I did detect a few instances of these, but I don't think they will cause a huge problem here. The updated validate_gff3
script will now warn about such genes.
You will need to restart your CAT run from the beginning with these new GFF3 files. You can retain the chain files in workdir/chaining to reduce compute time.
Hi Ian,
Thank you it worked.
Hi,
I did have some successful CAT runs, I tried recently with new genomes. This is the Error I got.
Attached log file. dimm_log.txt
Thank you in advance. Chatla