SunPengChuan / wgdi

WGDI: A user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes
https://wgdi.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
121 stars 22 forks source link

example data #22

Closed sjfleck closed 1 year ago

sjfleck commented 1 year ago

I wanted to recreate your results with the sample data starting with Vitis vinifera against Vitis vinifera, but I had this error: [Errno 2] No such file or directory: '../../blast/vvi161s_vvi161s.blast'

I checked the directory and vvi161s_vvi161s.blast wasn't there. Can you provide the command you used to create vvi161s_vvi161s.blast? I looked for it and the documentation just says to use BLASTP, MMseqs2, or DIAMOND. I tried this command for rundiamond.py: python ./rundiamond.py vvi161s.pep.fa vvi161s.pep.fa vvi161s vvi161s_vvi161s.blast

but I got this error: sh: diamond: command not found sh: diamond: command not found

Any help with this would be greatly appreciated

SunPengChuan commented 1 year ago

Hi, I didn't upload the blast file due to its large size. Your command with rundiamond.py is correct, but you must install the diamond software.

sjfleck commented 1 year ago

Thank you. I see my mistake now and was able to create the example dot plot.

My next question is about getting the gff files into the right form and generating the lens files. I saw these scripts: 01.getgff.py 02.gff_lens.py 03.seq_newname.py deal_gff.py

I assumed that these were to help with the gff and lans files, but I'm having some issues using them without getting an error. I haven't found these python scritps described in the documentation. I have some background in coding with python, but I'm not especially skilled in it. Any help with this would be greatly appreciated

sjfleck commented 1 year ago

I also tried to use deal_gff.py. It seems like it has the following usage: python $wdgi/deal_gff.py gff cds pep mark

I assume this is: python deal_gff.py genome.gff genome.cds.fasta genome.pep.fasta

but I'm not sure what the "mark" input is. I tried it without the mark option and got this error:

/path/to/deal_gff.py:24: FutureWarning: The default value of regex will change from True to False in a future version. gff[0] = gff[0].str.replace('Chr0?','') /path/to/deal_gff.py:41: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning. for name, group in gff.groupby([0]): Traceback (most recent call last): File "/path/to/deal_gff.py", line 46, in [str(sys.argv[4])+str(name)+'g'+str(i).zfill(5) for i in range(1, len(group)+1)]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/path/to/deal_gff.py", line 46, in [str(sys.argv[4])+str(name)+'g'+str(i).zfill(5) for i in range(1, len(group)+1)])


IndexError: list index out of range

Thanks for any help you can provide on this. Thank you
SunPengChuan commented 1 year ago

Hi, the mark is a shorthand for which you set yourself. I often use the abbreviation of the species to name it. It is important to note that the mark is unique for different species.