AI-sandbox / XGMix

13 stars 2 forks source link

Scikit Allel Unable to Parse Format Header Generated by Simulate #4

Closed kevinjyee closed 3 years ago

kevinjyee commented 3 years ago

Hi! This is just an fyi for those that are running into a similar issue with this repo!

When training a model from scratch, the simulate binary hosted in rfmix has Descripton misspelt https://github.com/slowkoni/rfmix/blob/9505bfae51ea57314d98060e6d09f6759cda8e8d/simulate.cpp#L277

This causes simulation.py to fail: https://github.com/AI-sandbox/XGMix/blob/master/Admixture/simulation.py#L65

Scikit's allel.read_vcf skips the Format Header since it's looking for Description and thus vcf_data["calldata/GT"] will fail with a KeyError

It doesn't look like rfmix has accepted any recent PR's so I just wanted to paste this as informational here in case anyone was looking for a similar issue. (feel free to close without action)

dmasmont commented 3 years ago

Thanks for pointing it out Kevin! We are planning to update the code soon and remove the dependency with the simulate binary of RFMix.