banzhou59 / GenomeSyn

6 stars 7 forks source link

Issue with example data format #5

Open Wangray123 opened 7 hours ago

Wangray123 commented 7 hours ago

Hello,

I am trying to prepare a data file for visualization using this software, but I don't understand what the score values in the 4th column of your example file (rice_MH63_repeat.bed) mean, as well as the values in the 5th column of the gene annotation file (rice_MH63_nonTEgene.gff3). repeat gff3

Could you please explain them to me? Also, what kind of command should I input to obtain this type of file? Could you please provide me with the code to obtain these two files?

I look forward to your reply. Thank you!

banzhou59 commented 7 hours ago

The score in the fourth column of the rice_MH63_repeat.bed file represents the length of the TE annotated in the current bin. The fifth column of the rice_MH63_nonTEgene.gff3 file contains the end base position of gene annotations. You can learn more about the BED and GFF3 file formats through the following links:

https://github.com/jianshu93/gfftobed/blob/main/README.md https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md TE annotation files can be obtained using EDTA or RepeatMasker, and you can use bedtools for statistical analysis or convert formats using gfftobed (https://github.com/jianshu93/gfftobed). Gene annotations can be obtained through homology mapping or de novo annotation. You can also learn more about the input file formats for GenomeSyn through the following link: https://github.com/banzhou59/GenomeSyn/blob/main/GenomeSyn-1.2.7/README f1