edgardomortiz / Captus

Assembly of Phylogenomic Datasets from High-Throughput Sequencing data
https://edgardomortiz.github.io/captus.docs/
GNU General Public License v3.0
18 stars 5 forks source link

captus paralog filter - references added in the wrong direction #12

Open EdBiffin opened 2 months ago

EdBiffin commented 2 months ago

Dear Edgardo, Ive noticed that when adding reference sequences to alignments, prior to informed paralog filtering, in some cases these are added in the reverse direction to the extracted sequences in the alignment. Im using a custom reference file that comprises the sequences that were used for probe design, mostly sourced from 1KP and Phytozome - the references were generated by clustering using CD-Hit (longest sequence per cluster at specified identity). Ive attached an example alignment and also the references for that gene. I'm using v1.01. Any advice would be greatly appreciated. AT1G03750.fna.txt AT1G03750.references.txt

edgardomortiz commented 1 month ago

Dear Ed,

Sorry for the really late reply, I was in the chaos of moving countries. I see that in your reference all sequences are in different reading frames so Captus might be having troubles translating them consistently.

Captus translates the references using the six reading frames, then selects the reading frame that produces the fewest internal stop codons, if there is a tie between two reading frames it will prefer a positive reading frame. So maybe these are not CDS?

If you don't care about obtaining the aminoacid format from the alignment step, and you are sure all are in the same direction you could provide the reference to Captus as miscellaneous DNA (-d AT1G03750.references.fasta).

If the aminoacid output is necessary then I would suggest verifying that these are translatable (preferably in reading frame 1) or at least consistently for all

Let me know if this helps!

Edgardo

EdBiffin commented 1 month ago

Dear Edgardo, thanks for your reply – much appreciated. These are the sequences used for probe design, in some cases only partial exons, so that largely explains the issue. Hope your move went well and hoping that you continue to develop captus – were finding that it plugs a lot of gaps that are issues in other pipelines. Ed

From: Edgardo M. Ortiz @.> Date: Friday, 2 August 2024 at 3:11 pm To: edgardomortiz/Captus @.> Cc: Ed Biffin @.>, Author @.> Subject: Re: [edgardomortiz/Captus] captus paralog filter - references added in the wrong direction (Issue #12) CAUTION: External email. Only click on links or open attachments from trusted senders.


Dear Ed,

Sorry for the really late reply, I was in the chaos of moving countries. I see that in your reference all sequences are in different reading frames so Captus might be having troubles translating them consistently.

Captus translates the references using the six reading frames, then selects the reading frame that produces the fewest internal stop codons, if there is a tie between two reading frames it will prefer a positive reading frame. So maybe these are not CDS?

If you don't care about obtaining the aminoacid format from the alignment step, and you are sure all are in the same direction you could provide the reference to Captus as miscellaneous DNA (-d AT1G03750.references.fasta).

If the aminoacid output is necessary then I would suggest verifying that these are translatable (preferably in reading frame 1) or at least consistently for all

Let me know if this helps!

Edgardo

— Reply to this email directly, view it on GitHubhttps://github.com/edgardomortiz/Captus/issues/12#issuecomment-2264599929, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHX653GVC4S3RFV542NUVY3ZPMLYJAVCNFSM6AAAAABKA5DSJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRUGU4TSOJSHE. You are receiving this because you authored the thread.Message ID: @.***>