SunPengChuan / wgdi

WGDI: A user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes
https://wgdi.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
114 stars 22 forks source link

Preparation of input file #24

Closed ardy20 closed 1 year ago

ardy20 commented 1 year ago

Hello SunPeng I could not find the python files 01, 02, 03 for the preparation of .gff and .lens files. Are they integrated with the Wgdi tool? Could you please simply give some command examples how to use these .py files for the preparation of input files? The videos are not very clear. Regards

SunPengChuan commented 1 year ago

Dear ardy20,

Here's a new script at https://github.com/SunPengChuan/wgdi-example/blob/main/code/deal_gff.py. The command is ‘python deal_gff.py gff3 cds pep mark’, gff3 ,cds and pep are the ones you annotated or downloaded. mark is a marker that you can set as you like.

Yours sincerely Pengchuan Sun, Ph.D. College of Life Sciences, Sichuan University, Chengdu, China

ardy20 commented 1 year ago

Hello SunPeng

Can we change the python deal_gff.py in a way that can sort the genome file and gff file in descending order (based on the length of chromosomes in fasta file from larger to smaller) and also shorten the name of sequences (seq_id). For example, I used words "Chr1", "Chr2", "Chr3" and it was still large to be fitted in to dotplot figure nicely. Even the word "Chr" is long.

When I manually edited/sorted lens and gff file to make the the sequence ids shorter and also in descending order (biggest Chr first to smallest) then many other problems were occurred.

I suggest we change this deal_gff.py in a way that can either make the fonts smaller or shorter the name of sequence ids in a way that can be fitted into the dot plot (whichever easier) and also sort the the sequences (in both pep.fa and .gff) descending (Chr1-Chr..n, larger to smallest, respectively).

Hope this is possible to improve the quality of your great work.