Open EdBiffin opened 9 months ago
Hi @EdBiffin!
I didn't use the --adjustdirection
flag because during extraction all the sequences are put in the same direction as the sequence you used as reference. I wonder if you could upload one of those alignments here, I would like to solve the issue (or at least explain it)
Thanks
Edgardo
Hi Edgardo, thanks for your quick response. Ive attached an example alignment. Im using a custom reference file which Ive also attached. Look forward to your response. Ed captus_refs_nu_combined.fasta.txt 6164.fna.txt
Thank you Ed,
Could you also tell me the actual command you used? or even better upload the extraction .log
file, this is very strange, the sequences shouldn't be reversed...
Edgardo
captus-assembly_extract.log Please find attached and please let me know if you need anything else.
Thanks for the patience!
Would it be possible to upload the assembly.fasta for 376903_Malleostemon_tuberculatus and 376896_Austrobaeckea_verrucosa (if they are too big, maybe other smaller assemblies that produce locus 6164 in opposite directions). Finally, so I can try to replicate the issue, what was the captus align
?
Sorry, the link for Malleostemon got broken... (I got the other two)
By the way, while checking the reference I noticed you have several sequences with identical names, Captus will only take one of them because they have to be unique to avoid problems (in the picture the duplicates have a 2
after the name, these are just an example, there are many more)
I got it!, when you provide a reference of nuclear proteins in nucleotides (CDS), Captus needs to translate it first (because Scipio performs a translated search on the assemblies).
Because I can't assume all sequences are translatable in Frame 1
, Captus tries to guess the reading frame for each sequence, it translates it in the six reading frames and selects the frame that produces the fewest stop codons.
Now, I didn't anticipate that in some references like in your case, a sequence like Syzygium_micranthum-6164
can be perfectly translated in Frame 1
and Reverse Frame 3
(and Captus chose the latter in this case), so I will modify the code to choose a positive reading frame in tied cases like this. So basically, the reversed sequences in the alignment 6164
followed this "reversed" protein from Syzygium_micranthum-6164
.
Until I post the updated code, the solution would be that you provide the reference in aminoacids unfortunately (or remove Syzygium_micranthum-6164
and provide it in nucleotides) Have you noticed other cases with reversed sequences?
Edgardo
Actually, in the same locus eucgr-6164
can also be translated in Reverse Frame 1
without stop codons, but with a final stop codon in Frame 1
. I guess I will need to add a rule to not count a stop codon when is at the end too.
Hi again,
This fix will come with the next release (v1.0.1), for now just decompress this attachment and replace your current bioformats.py
(in the captus
folder that is inside your Captus installation folder) with this version that improves the reading frame prediction. In my tests locus 6164 is now correctly translated in the reference.
bioformats.py.zip
Hi Edgrado, that all makes sense - thanks again for your help and look forward to then next release.
Dear Ed,
In case you didn't patch the previous version, I made the release on Bioconda incorporating many other changes... Let me know if it v1.0.1 works better in this aspect.
Edgardo
Dear Edgardo, thanks for the heads up. I tried the patch and Ive also run some data through using v1.01. All looks good, but I'll let you know if I find any issues. Many thanks for your help. Ed
From: Edgardo M. Ortiz @.> Sent: Tuesday, 5 March 2024 2:11 AM To: edgardomortiz/Captus @.> Cc: Ed Biffin @.>; State change @.> Subject: Re: [edgardomortiz/Captus] MAFFT adjust direction (Issue #5)
CAUTION: External email. Only click on links or open attachments from trusted senders.
Dear Ed,
In case you didn't patch the previous version, I made the release on Bioconda incorporating many other changes... Let me know if it v1.0.1 works better in this aspect.
Edgardo
— Reply to this email directly, view it on GitHubhttps://github.com/edgardomortiz/Captus/issues/5#issuecomment-1976871971, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHX653CFFP52DMMBHZ6JR7LYWSIZVAVCNFSM6AAAAABCOS5XGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZWHA3TCOJXGE. You are receiving this because you modified the open/close state.Message ID: @.***>
Ive noticed that MAFFT is generating alignments with sequences in both forward and reverse orientation. Is it possible to add the MAFFT --adjustdirection flag to the pipeline?