Open 0xaf1f opened 1 year ago
Hi,
The issues is the order tag. I think in the past I had a regular expression to replace it. Let me have a look at your fix.
Best, Thomas
On 16 Feb 2023, at 21:56, Afif Elghraoui @.***> wrote:
The reference annotation https://www.ncbi.nlm.nih.gov/nuccore/NC_000962.3 contains
FT gene 3593369..3593852 FT /locus_tag="Rv3216" FT /pseudogene="unknown" FT /db_xref="GeneID:888845" FT misc_feature order(3593369..3593437,3593439..3593852) FT /locus_tag="Rv3216" FT /note="acetyltransferase (2.3.1.-), contains GNAT domain FT (GCN5-like N-acetyltransferase. See Vetting et al. 2005), FT probably pseudogene as appears frameshifted due to 1bp FT insertion at position 3593438. Frameshift present in all FT sequenced tubercle bacilli. Start changed since first FT submission, extended by 50aa." FT /pseudogene="unknown" FT /db_xref="PSEUDO:CCP46032.1" which gets transferred to the input assembly as
FT gene complement(116773..117256) FT /locus_tag="Rv3216" FT /note="*pseudogene: unknown" FT /db_xref="GeneID:888845" FT /gene="Rv3216" FT misc_feature complement(order(116773..117256) FT /locus_tag="Rv3216" and then parsing the annotation file fails because the misc_feature coordinate has an unbalanced parenthesis.
— Reply to this email directly, view it on GitHub https://github.com/ThomasDOtto/ratt/issues/12, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEOT7ET5UZAGYUMYJEJO47LWX2PCRANCNFSM6AAAAAAU6WUT3A. You are receiving this because you are subscribed to this thread.
@0xaf1f Thomas refers to a fix, are you aware of this?
No, I haven't gotten to it yet since I've been working on my own code. I think RATT would benefit from using Bioperl to read/write embl files (it might even take care of #10), but I haven't looked into how disruptive that would be versus updating a regex. I wouldn't suggest waiting for me when your focus is already here.
Using Bio::SeqIO (Bioperl) would allow me to essentially replace main.ratt.pl:300-500 or so with only a few lines of code, if I have it right. Will put it on the to-do list.
But it requires to install bioPerl, which was annoying in the past…
Best, Thomas
On 17 Mar 2023, at 15:02, Will Haese-Hill @.***> wrote:
Using Bio::SeqIO (Bioperl) would allow me to essentially replace main.ratt.pl:300-500 or so with only a few lines of code, if I have it right. Will put it on the to-do list.
— Reply to this email directly, view it on GitHub https://github.com/ThomasDOtto/ratt/issues/12#issuecomment-1473977738, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEOT7EUMMSZWFD6W3EIHUVLW4R4HRANCNFSM6AAAAAAU6WUT3A. You are receiving this because you commented.
The reference annotation contains
which gets transferred to the input assembly as
and then parsing the annotation file fails because the
misc_feature
coordinate has an unbalanced parenthesis.