brentp / peddy

genotype :: ped correspondence check, ancestry check, sex check. directly, quickly on VCF
MIT License
134 stars 39 forks source link

Input PED file; replace sep with whitespace #51

Closed jrouhana closed 5 years ago

jrouhana commented 5 years ago

Hi Brent,

I was having trouble running peddy for my analysis and dug into the problem. It looks like in cli.py, line 176, you have sep='\t' set for the input DataFrame coming from the PED file. When going through your documentation, looking at the link for the PED file format links me to PLINK. PLINK, as of 1.9, outputs its PED files with spaces in between variables, not tabs.

Locally, I've replaced the above with sep=r"\s+" without issue.

Thanks, John

brentp commented 5 years ago

I can't make that change because, for better or worse, some people have spaces in their sample names. I used to try to guess if the file was using spaces, but stopped doing that as it introduced more problems.

I think from plink you'll want the .fam file, not what they call a .ped file, right?

jrouhana commented 5 years ago

I see. That's unfortunate. Definitely do want the .fam file from PLINK though- the .ped file is unrelated to the task at hand.

Much appreciate the response!