Open almosnow opened 4 months ago
Hi @almosnow
Thanks for reaching out. Could you please give us more information on how you get the gene markers? would be great if you could share with us the mplog.log
file and the command line(s) you use.
One thing is that the fasta record ID of gene markers in both amino acid level (OGXX.fa
files) and nucleotide level should match. This is needed when you concatenate all fna
files as dna_ref.fa
and provide read2tree with --dna_reference dna_ref.fa
. Otherwise, read2tree uses RestAPI to download them from OMA web browser assuming that the gene markers are downloaded from the OMA web browser.
Best, Sina
Hmm, ok I see, I did not set up the gene markers properly I think.
Actually, now that I've read more, what I did was completely wrong.
Here's my scenario, perhaps you can advice on what to do.
We have a set of ~15 sequences (coding sequences from the same gene and the same organism, different samples around the world), with minor variations between them, a phylogeny shows two major groups distinct of each other (but changes between them are small, SNPs and the like).
We have another set of a few hundred SRA libraries and we would like to find out to which of the aforementioned 15 sequences they are most similar to.
Is it ok to use those initial 15 sequences as marker genes and try to fit the reads into them?
For this case, Read2tree can generate a tree in Multiple species mode
. However, one gene might not be enough to describe the evolution of organism or provide enough resolution for distinguishing all samples.
Anyway, you can put the amino acid sequences in a fasta file in the marker_genes
folder and the nucleotide sequences of coding regions (with exact order) in another fasta file, mentioned with --dna_reference genes.nuc.fa
. Note that the gene names should match in both files. Each starts with a five letter code for each strain, like this
>ASTMX02439
>PYGNA12763
>ELEEL42119
Hello,
I am trying to use read2tree, able to install it and run it, the example runs without a hitch and I get the expected output files.
When I try to use my sequences though, I was getting many "Invalid marker group" errors, this was relatively straightforward to take care of, I just renamed the fasta header lines accordingly.
Now I cannot get past an error that reads
KeyError: 'U1810'
in particular,I think this definitely has to do with the five letter you use/infer, but don't really know how to make it work properly,
Any ideas?