AdmiralenOla / Scoary

Pan-genome wide association studies
GNU General Public License v3.0
148 stars 35 forks source link

recording missing data in trait.csv file? #47

Closed jrherr closed 7 years ago

jrherr commented 7 years ago

Hi @AdmiralenOla, thanks so much for your awesome tool!

I am getting an error and I think it's related to the fact that I have missing data in my matrix. I have it recorded as "NA". I don't want to code it as "0" because I don't know if the data is missing, I didn't collect that trait for that strain.

Is there a better way for me to code missing data in the traits.csv file?

Thanks so much! ~ Josh

AdmiralenOla commented 7 years ago

Hi Josh, and thank you for the uplifiting words about Scoary!

I think the best way to go about this is to use the --restrict_to flag when analyzing for that particular trait. Then you could point to a csv file with the names of the complete cases strains. You'd have to split up your traits file to do this. A bit cumbersome, but it works.

I plan to add support for missing data in the next version, which I will try to have out before the end of the year. The missing data would essentially be handled in the same way as this workaround though, i.e. excluding non-complete cases. Best of luck!