Z values and PofZ - Githubissues

Hi Eric and developers. Thanks for the useful program you leave at our disposal.

I am using the program to evaluate the assignment of certain individuals to reference groups, of Atlantic salmon.

I used the following command to estimate the mixing proportions:

mix_est <- infer_mixture(reference = referencia, mixture = libres, gen_start_col = 5, method = "PB", reps = 50000, burn_in = 5000)

The program works quite well on my computer and I have no memory problems running it. However, I have found that the results are relatively different from the analyses I did with the program STRUCTURE & DPCA. My data set corresponds to 456 individuals for reference and 80 individuals to evaluate their origin. As I read in the tutorial, the Z-values serve to evaluate to what extent the individuals to be evaluated fit the reference individuals. So here my doubt arises:

As you can see, the Z values don't quite fit the normal curve. The individual's lowest Z-scores and their PofZ were: -22,76370825 / 1 -21,69095868 / 1 -21,65942465 / 1 -13,96500187 / 1 Should I then assume that these samples do not really come from the populations which they were assigned to? I also have other individuals with a PofZ of 1 but with a lesser Z-scores.

I would expect differences between Z-scores distribution and a normal distribution, since the reference populations are probably purer than the commercial lines from which the samples analyzed (probably) came. This might explain the difference between the normal distribution and the distribution of Z-scores. However, I do not find reliable that individuals are assigned to certain collections with a PofZ=1.

Is it possible to define threshold based on z-scores to define which individuals can be confidently assigned to the reference? Perhaps Z scores > 5 ?

Any help or point of view would be very helpful and appreciated.

eriqande / rubias

Z values and PofZ #30