Closed petrelharp closed 5 years ago
I've just committed a fix for this to the master branch. Note that VCF output now contains the genome pedigree IDs for each output genome, but does not contain the individual pedigree IDs. This is because in the general case in SLiM there is no guarantee that the two genomes assembled into a diploid "sample" for purposes of VCF output actually come from the same individual, so in the general case the individual IDs for the "samples" are not in fact well-defined. In many cases they do happen to be consistent, but there is no guarantee that that is the case, and it would be confusing if the individual pedigree IDs were sometimes provided but sometimes missing. Better to just provide the genome pedigree IDs, which are always well-defined; the user can easily transform them into individual pedigree IDs if desired.
As discussed in #42, it ought to be possible to know from the VCF which individuals have been output. This only makes sense when individuals actually have unique identifiers, which is I think only if they have pedigreeIDs.
There, Ben said:
The last line in the header has the sample IDs, which you currently fill out like i0 i1 i2 .... These could be replaced with like i... if these are defined. This would be a good idea, I think.
Hm, I'm not sure about that. Most properties are individual-based (e.g., phenotype) and in both SLiM and in the tree sequence you can get the genome IDs from the individual IDs.