Working on haploid species

OmonkeyGOD commented 4 years ago

Hi ~ Thanks for developing this handy program. I am working on a haploid fungus and wondering if I can use this program as well. I have 12 samples and generate these two files at last. It seems like all the sample was assumed as diploid by default which can also be seen in the VCF file. Any suggestions? I also analyzed the mitochondria at the same time. Will it influence the results as the mitochondrial genome usually has very high coverage? Is it recommended to run the nuclear genome and the mitochondrial genome separately? Last, I got lots of warnings like this miniconda3/envs/mitozEnv/lib/python3.7/site-packages/sklearn/cluster/kmeans.py:968: ConvergenceWarning: Number of distinct clusters (8) found smaller than n_clusters (10). Possibly due to duplicate points in X. return_n_iter=True) Any suggestions to solve them? Thank you very much!

genotypeCNVR.vcf.zip genotypeCNVR.tsv.zip

biozzq commented 4 years ago

Dear @OmonkeyGOD

Generally, CNVcaller can work on a haploid. Yes, in default, CNVcaller now is designed for diploid. In fact, the genome-wide read depth was normalized to 1 before genotyping. Thus, you can comment out cns *= 2 in the Genotype.py.

During normalization, CNVcaller used the median which can be more robust (less sensitive to outliers in the data), thus you can run the nuclear genome and the mitochondrial genome together.

Last, if possible can you share me your merged CNVR results?

Best, Zhuqing

OmonkeyGOD commented 4 years ago

Hi Zhuqing,

Thanks for your quick reply. Please see the attached result. For the VCF format, I am not sure why the GT info is different in some windows. For example, I got some results like this 0/1:3.44. What is the meaning of 0, 1, and 2 in genotype? If they represent the REF and ALT, why I got something like 0/1 in haploid? How can I get both the REF(1 copy) and ALT(multiple copies) in haploid species in the same window?

Thanks and Best regards OmonkeyGOD genotypeCNVR_merge.vcf.zip

biozzq commented 4 years ago

Dear @OmonkeyGOD

Thank you. I need the merged CNVR generated by CNV.Discovery.sh. I will reran the genotype using this file.

In haploid, the normalized copy number for the regular CNV should not be clustered into more than two classes. The genotypes in CNVcaller only represent the clustering results.

Best, Zhuqing

OmonkeyGOD commented 4 years ago

Dear Zhuqing,

Thanks for your reply. Please see the file attached.

Best, mergeCNVR.zip

dinahparker commented 2 years ago

Hi, I was wondering if this issue had been solved, because I also have a haploid fungi, and my VCF file looks similar to @OmonkeyGOD, with two classes (0/1, 0/0, etc.). I also am not quite sure I understand what these values mean in the context of CNVs. Any help would be appreciated and thank you in advance.

Best, -Dinah

biozzq commented 2 years ago

Dear all,

Sorry for the late reply. I found the demo provided by @OmonkeyGOD can not be properly genotyped in haploid context. It is possible that the read depth based methods are not suitable for haploid CNV discovery when the sequencing depth is low (<50X, etc). Thus, I suggest you can try any other based methods, such as split reads and reads pair. I also use the GenomeSTRiP in my own diploid CNV calling, I think it can meet you needs.

Best, Zheng zhuqing

dinahparker commented 2 years ago

Thank you for the clarification and the quick reply, I'll give another CNV calling program a try then.

Best regards, -Dinah

JiangYuLab / CNVcaller

Working on haploid species #7