run duration for a tetraploid genome with 120k genes?

bioteksampath commented 1 year ago

Hi Pucker, Thanks for this tool, I was wondering, how log does it take it to complete a tetraploid plant genome with 120,000 genes? My job with 32 cores has been running for almost 24 days and is yet to be completed.

It seems RAxML_tree.raxml construction takes longer, do you have any suggestions to get this sooner?

I'm currently at the - RAxML_tree.raxml.ckp and RESULTS/01_initial_candidates.pep.fasta completed sofar.

#---------cmd used------------------ python3 ${myb_path}/MYB_annotator.py \ --baits ${myb_path}/MYB_baits.fasta \ --info ${myb_path}/MYB_baits.txt \ --subject Bnapus_1N99.pep_20220722.fasta \ --cpu $NSLOTS \ --mode raxml \ --refmybs ${myb_path}/AthRefMYBs.txt \ --raxml /home/sap223/anaconda3/envs/myb/bin/raxml-ng \ --out ${out_dir}

Thanks for your help, sam

bpucker commented 1 year ago

Hi Sam, Thanks for your interest in the MYB annotator. Running it with RAxML takes a long time. That is the problematic step here. You can speed it up by using FastTree2 instead. It should only take some minutes even with 120k genes. There are very few datasets that require more than an hour of runtime. Best wishes, Boas

bioteksampath commented 1 year ago

@bpucker -- thanks a lot. It worked well. On another note - Do you recommend any visualization and any other downstream analysis tools based on the output generated from this tool.

bpucker commented 1 year ago

I am glad to read this. We usually visualize our trees in iTOL: https://itol.embl.de/

bpucker / MYB_annotator

run duration for a tetraploid genome with 120k genes? #4