Closed qiuxx221 closed 5 years ago
First of all, using wgd only makes sense for CDS (coding DNA) sequences, so make sure that the data you provide consists of nice strings of DNA that can be translated into proteins. Secondly, you might have some issues with your sequence IDs. In general it's best to avoid pipe characters (|
) in sequence IDs and note that everything after the first space is ignored.
Thanks for your reply. In terms of the coding DNA sequence, I am using de novo transcriptome, does it mean maybe I should do Trandecoder first to know which sequences encode protein? Does it matter if it has a full ORF or it's ok just to be 5' partial?
Thanks!
I don't have a lot experience with analyzing transcriptomes, so I'm afraid I can't be of a lot help here, but yes, you absolutely need a protein coding DNA sequence, since Ks is a distance defined at the codon level, and wgd
uses codon-level alignments and codon models as implemented in codeml to compute Ks distances. I guess you can provide a 5' partial ORF, since wgd will translate codon by codon starting from the start of the sequence and will stop at the first stop codon or when the sequence terminates (ignoring the last nucleotides if the sequence length is not a multiple of 3)
Hi,
I am running the command
using PacBio Isoform sequencing data, but for some reasons the clustering step didn't produce mcl file. Do you know what the problem is?
Part of the error msg is below,
In the end, I have the blast-blast tsv file only...