caulai / Mo17_genome_assembly

14 stars 13 forks source link

How do I understand the final output? #2

Open jmsong2 opened 5 years ago

jmsong2 commented 5 years ago

Hi, When I get the clustalw.out.checked.filtered.structure file, how can I do statistics for it? For example, the number of gene with large structural variations. Thanks a lot!

Regards, Jiaming

caulai commented 5 years ago

Hi, I have upload a script '5.Gene_SV_classify.pl' to do statistics. Thank you for running these scripts. If you meet problems when using running scripts, you can also email me. (yencechow@qq.com)

jmsong2 commented 5 years ago

Thank you very much. I will try to it. And these scripts will report many "uninitialized value" warnings. I am not sure whether it will influence the result?

caulai commented 5 years ago

Nope, the warnings won't influence the result. But I suggest you use the updated scripts and it runs faster.

jmsong2 commented 5 years ago

Well. I will re-run with updated scripts.

jmsong2 commented 5 years ago

BTW, the 'gene.fa' (1.bwamem-main.pl) is expected to be a single sequence fasta splited from 'gene-full-cds-double.new.positive.fa' or the whole 'gene-full-cds-double.new.positive.fa' file.

caulai commented 5 years ago

You can use the whole 'gene-full-cds-double.new.positive.fa' file as ‘gene.fa’. But if your reference genome is too large, you can divide the 'gene-full-cds-double.new.positive.fa' file into ten or more small files and run separately to make the work finished in several hours

jmsong2 commented 5 years ago

Thanks for your advice!