This is something I ran into when trying to verify that I didn't break pedigree cohorts in the process of working on #236 and #261:
For the "minimum pedigree size" constraint, pedigree ascertainment checks the family_size column. Yet there's only four individuals present. Out of 69932 distinct fam_ids in dist_common, the overwhelming majority have the same number of individuals present that family_size would indicate; only 889 have a mismatch (and of those 889, only one family - 49-190-126 - has MORE individuals present; the rest have fewer).
(I was able to determine that with the following query: WITH baseline AS (SELECT fam_id, COUNT(ind_id) AS indcount, family_size FROM dist_common GROUP BY fam_id) SELECT * FROM baseline WHERE indcount != family_size;)
@WValenti did some initial investigation and discovered that those family_size values are accurate in the DIGS database, but not in the DIVER database:
From here it evidently becomes a matter of rediscovering in the DIVER DB generation scripts why those individuals are kept out going from DIGS to DIVER, and whether or not/how family_size should be changed to match the actual total individual counts. That's in @WValenti's corner.
This is something I ran into when trying to verify that I didn't break pedigree cohorts in the process of working on #236 and #261:
For the "minimum pedigree size" constraint, pedigree ascertainment checks the
family_size
column. Yet there's only four individuals present. Out of 69932 distinctfam_id
s in dist_common, the overwhelming majority have the same number of individuals present thatfamily_size
would indicate; only 889 have a mismatch (and of those 889, only one family - 49-190-126 - has MORE individuals present; the rest have fewer). (I was able to determine that with the following query:WITH baseline AS (SELECT fam_id, COUNT(ind_id) AS indcount, family_size FROM dist_common GROUP BY fam_id) SELECT * FROM baseline WHERE indcount != family_size;
)@WValenti did some initial investigation and discovered that those
family_size
values are accurate in the DIGS database, but not in the DIVER database:From here it evidently becomes a matter of rediscovering in the DIVER DB generation scripts why those individuals are kept out going from DIGS to DIVER, and whether or not/how
family_size
should be changed to match the actual total individual counts. That's in @WValenti's corner.