freeseek / mocha

MOsaic CHromosomal Alterations (MoChA) caller
MIT License
81 stars 23 forks source link

Could not parse gender (0/1/2) in the sample statistics file #32

Closed uqzqiao closed 1 year ago

uqzqiao commented 2 years ago

Dear developer,

I ran into this issue when running the mCA detection pipeline. If the coding for gender is 0/1/2 (in the sample statistics file), I will get the following error "Could not parse gender 1 xxxx(call rate)". I've tried to convert it to "M/F/U" (but the coding for gender is still 0/1/2 in the $sex file), the analysis can be completed but the "baf_auto" in the stats file are NaN for all samples. My question are, (1) What are the potential causes of the NaN values? (I suspect the "NaN" might be due to the coding for the gender). (2) Do you have any suggestions to overcome this problem? Thank you for your help in advance!!

image
freeseek commented 2 years ago

(1) a very obvious cause for NaN values for the baf_auto variable is that your VCF was not phased (2) I am not sure why you are getting a could not parse gender error. That should not happen like that. Can you try to see what the following command generates:

cat $sex | grep 0.9898973 | xxd

How was the $sex file input into MoChA?

uqzqiao commented 2 years ago

Thank you so much for your swift reply! (1) The input VCF file was phased. Btw, interestingly, I ran the phasing step using both Eagle2 (--Kpbwt=100,000) and Shapeit (as described in MoChA's README), and the output $pfx.calls.tsv file were identical. Was this expected? (2) The command generated,

00000000: 3230 3330 3430 3837 3030 3434 5f52 3033  203040870044_R03
00000010: 4330 3109 4d09 302e 3938 3938 3937 330a  C01.M.0.9898973.

Btw, not sure whether this matters, but I changed the coding for gender (from 0/1/2 to M/F/U) in the tsv file (# file with sample statistics (sample_id, computed_gender, call_rate)) and then the analysis ran successfully. The coding for gender in the sex file (sex="..." # tab delimited file with computed gender information (first column sample ID, second column gender: 1=male; 2=female)) was 0/1/2. Sorry for not explaining it clear above!

Thanks again for your help!!