Closed bensutherland closed 4 years ago
This is a result of missing data, and default behavior of read_tsv
in the readr
package. Stumbled across it independently of above error, so can't guarantee it is the exact same behaviour, but seems likely.
Note:?read_tsv :
"col_types |
One of NULL, a cols() specification, or a string. See vignette("readr") for more details.If NULL, all column types will be imputed from the first 1000 rows on the input. This is convenient (and fast), but not robust. If the imputation fails, you'll need to supply the correct types yourself. |
---|
So, it looks at the first 1000 rows. If it doesn't see any genotypes (or many?) it calls it a logical due to the frequency of NAs. This is likely indicative of poor markers, collection-specific NAs, or panel-version specific NAs. Either way, the logical throws an issue.
Two potential ways to fix this a) explicitly define columns (eg. all character on input) or b) look at more rows before guessing. I've chosen option b to address it bu increassing guess_max to 100000 by default: https://github.com/bensutherland/simple_pop_stats/commit/998fd8ef238af50c2697d06cea7dd3b4c66496ff
Issue with 100% simulations, seems to be due to record names such as allele 1 ("ots_epic4_158_1") and allele 2 ("ots_epic4_158_1_1"). Potentially needs to be corrected in terms of marker names, ensuring that marker names do not have the _1 a the end. Not clear how to resolve this in the meantime, other than deleting the offending record.