Unable to run wgd - Githubissues

amit4mchiba commented 5 years ago

Hi,

I am interested to use this tool in order to perform Ks and WGD analysis. I am using orthofinder for clustering of genes and hope to use this tool to plot comparison between species.

I followed instructions to install this software, and it was very easy. When I type wgd or other commands, I can see the options, which I think means that probably the tool was installed properly.

However, when I tried to run the program (any options), I am getting this error-

(py3) amit8chiba@amit8chiba-Precision-Tower-7910:/mnt/md0/Opu_r1.2_final/Comparitive_genomics/Ks_analysis_test/on-going_analysis$ wgd wf2 -n 16 cac_hc_gene_models.pep.fa, cro_v2.proteins.fasta ./cac_cro_Ks_out/
Traceback (most recent call last):
  File "/home/amit8chiba/miniconda2/bin/wgd", line 11, in <module>
    sys.exit(cli())
  File "/home/amit8chiba/miniconda2/lib/python2.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/amit8chiba/miniconda2/lib/python2.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/amit8chiba/miniconda2/lib/python2.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/amit8chiba/miniconda2/lib/python2.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/amit8chiba/miniconda2/lib/python2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/amit8chiba/miniconda2/lib/python2.7/site-packages/wgd_cli.py", line 1349, in wf2
    output_dir=blast_dir)
  File "/home/amit8chiba/miniconda2/lib/python2.7/site-packages/wgd_cli.py", line 334, in blast_mcl
    if can_i_run_software(software) == 1:
  File "/home/amit8chiba/miniconda2/lib/python2.7/site-packages/wgd/utils.py", line 67, in can_i_run_software
    except FileNotFoundError:
NameError: global name 'FileNotFoundError' is not defined

I first thought that It could be due to python version, and therefore, I created a py3 environment in conda to install python 3.5, and then activated the source. Then I reinstalled the tool but now getting the same message.

I will really appreciate your help to get this tool working for me.

Hope to get your advise, and please let me know if you need any further information.

arzwa commented 5 years ago

Hi, although you say you are using a python3 environment, wgd seems to be installed among the python2.7 libraries as you can see in the traceback. Make sure you correctly install wgd in the desired environment equipped with python3.

Also, the command is not correct, there should be no comma between your sequence files (apparently the example is wrong in the help message of wgd wf2). That will not have caused this error of course, but just saying for later. Also make sure you provide CDS fasta files (coding sequences in nucleotide alphabet) and not protein fasta files (amino acid alphabet).

amit4mchiba commented 5 years ago

Wow, thats such a quick reply. Many many thanks.

Yeah, I ignored it. I have now create py3 env and reinstalled WGD, and now it seems working (so far no error).

Thanks a lot.

amit4mchiba commented 5 years ago

I am sorry but I have another question and will appreciate your response.

My main objective to run this program is to calculate Ks within and between species.

Before this program, I performed orthofinder analysis using 28 plant species. I was wondering if I can use that data as input for this software.

For example, for wgd ksd -o beaver_eagle orthologs.tsv beaver.cds.fasta eagle.cds.fasta, Can I use orthologs.tsv output from orthofinder analysis?

thank you so much.

arzwa commented 5 years ago

Hi, you definitely can use your orthofinder results, but not rightaway. You will need to do some pre-processing steps. There are two main strategies you could use:

(1) To calculate between species Ks values, say species A and B, you will need to have a file with one-to-one orthologs for species A and B, for example:

A1    B2
A3    B6
B1    A2

Which is a tab separated file of orthologs. You could get such a file for example from your orthofinder results by taking the A gene and B gene for all families with exactly one gene for species A and one gene for species B (there are other ways to do it). If you have this kind of file (AB-orthologs.tsv for example), you can do

wgd ksd AB-orthologs.tsv A.cds.fasta B.cds.fasta

For calculating the within species Ks distributions with wgd, you need a file with all paralogous families for the species of interest (say species A), this looks like:

A1   A2   A6   A9
A3   A4   A5
A7   A8

You could again parse this out of your orthofinder results by parsing out all gene IDs for species A for all your families.

(2) The second strategy is not to use the orthofinder families, but only the blast results from orthofinder (which are somewhere in the WorkingDirectory from your orthofinder analysis). For between species (let's say species 1 and 2 in your orthofinder results) Ks distributions you can then use the Blast1_2.txt file with wgd mcl

sed 's/_/|/g' Blast1_2.txt > Blast1_2-wgd.txt
wgd mcl --one_v_one -b Blast1_2-wgd.txt

and for the within species case (say species 1) you could do

sed 's/_/|/g' Blast1_1.txt > Blast1_1-wgd.txt
wgd mcl --mcl -b Blast1_2-wgd.txt

These steps will give you the GENE_FAMILIES input files necessary for wgd ksd. The problem is however that you will need to have CDS fasta files with gene IDs corresponding to the gene IDs in these families files (which are renamed by Orthofinder), alternatively you can try to convert the gene families files obtained with wgd mcl like above back to the original gene IDs by using the SequenceIDs.txt files in the OrthoFinder WorkingDirectory.

In any case, you will need to do some data carpentry to get the files you need, but it's all quite feasible.

arzwa / wgd

Unable to run wgd #12