DessimozLab / read2tree

a tool for inferring species tree from sequencing reads
MIT License
144 stars 18 forks source link

Can read2tree be run offline and with custom marker gene list? #22

Open masudermann opened 1 year ago

masudermann commented 1 year ago

Hello,

I tried out read2tree and was impressed. I had a quick followup. Is it possible for users to input their own custom list of maker genes?

I work with several different Phytophthora species. When I use the OMA browser, I can only obtain marker genes from 7 species. Being able to provide my own set of genes would be advantageous.

From my quick look at the paper, documentation, and other questions people had, it looks like there isn't currently a way to run read2tree offline or with a custom set of marker genes. If this is is the case, will a future update incorporate this option?

alpae commented 1 year ago

Dear @masudermann

it is possible, but not entirely straight forward. You need to provide

  1. a list of marker genes in fasta format for their protein sequence. Note that it is expected that each sequence contains the species it belongs encoded in a [species tag] at the end of the fasta header. There must be at most one sequence per species in each marker gene (and the sequence need to be all orthologous to one another).
  2. You need to have a fast file with the same headers containing all the coding sequences (CDS) coresponding to the protein sequences. You can provide all the sequences in a single fasta file.

Then, you should be able to run read2tree with the command:

read2tree --tree --standalone_path <marker_genes>  --dna_reference <cds_file> ...

If you observe any problems we would be glad to hear about them. The tool should definitively be able to work also with markers not coming from OMA (but it is certainly much less tested).

Cheers Adrian

masudermann commented 1 year ago

Thank you! The instructions are helpful. I will keep you posted.

sinamajidian commented 10 months ago

For future references: we have also some instruction here which works for NCBI refSeq. We would be happy to generalise it for specific format of your interest.