Redesign Ascertain Pedigrees to better fit actual researcher workflow

Viqsi commented 1 month ago

While release-candidate-testing the Ascertain Pedigrees subsystem, Jake ran into an issue in which he was trying to create a "pedigree cohort" with no "affecteds". I didn't know if this made sense or not, and opted to reach out to Veronica. What I got back was a tremendously useful primer on researcher workflow that this is intended for:

The traditional way of ascertaining pedigrees for “real” studies is to follow a 3-step procedure. In the first step, investigators look for index cases. (An aside: almost everyone refers to these individuals as “probands,” though often additional people are often called “probands,” and technically this usage is incorrect and confusing, for reasons we can discuss but suffice it to say at the moment that I’m going to stick to “index cases,” though your users will probably be happier with “probands.”) Index cases can be found in multiple ways: e.g., enrolling patients through a specialty clinic; advertising for subjects in magazines or on patient-centered websites; through population registries etc.. Then they are assessed to make sure they meet the investigators’ inclusion criteria for index cases. In the second step, investigators ask for information about the relatives of qualifying index cases – and proceed to contact and assess the relatives. So: step 1 is find index cases; step 2 is find relatives of those index cases. At this point we have step 3: apply any additional inclusion criteria to the family as a whole. For instance, some of those qualifying index cases might not have any affected relatives and therefore they would be dropped from the dataset; or investigators might explicitly drop any families with < 4 affected members, or, with fewer than 6 phenotyped (say, with DIGS) relatives, etc..

Also, as kind of an additional aside: Step 2 is often itself structured in stages, for example, in a study of a DT, investigators might follow a rule that says “interview all first degree relatives of any affected individual.” Thus step 2 might itself involve several steps, as bit by bit (nuclear family by nuclear family) the pedigree is extended until there are no more 1st degree relatives of affected individuals to include. We decided early on that setting up this kind of “intrafamilial sampling” is not something we would do; that we would include all relatives for any family meeting inclusion criteria. But you can see that for an investigator who is familiar with this kind of strategy (e.g., Jake), including all relatives regardless of their phenotypes makes sense. This would allow Jake, if he wanted to, to download the full pedigrees and then implement his own intrafamilial sampling rules.

Anyway, the result of this 3-step procedure is a set of pedigrees, and you can view this of course as just another way to form a cohort. Another way to accomplish the same thing in DIVER would be to define a cohort – say, everyone with SZ – and then add all the relatives to the cohort. I think this is the way you are thinking about this module, but it is not the way users – including Jake – think about it. Furthermore, the user might want to restrict the cohort based on characteristics of the relatives only, or not (per Jake’s intended usage).

Long ago we talked about this, but at the time we really were in a hurry so threw something kind of temporary in place. Now I know you are in a hurry again, but I think this module needs to be refined before any public release – or users will be frustrated and they will walk away from what is perhaps the single best thing DIVER can do for the world of psychiatric genetics: provide a wealth of information on pedigrees!

The refinements we're ACTUALLY doing for initial public release are being detailed in a different issue (since this prompted a pretty extensive discussion as to what we could and could not do in the timeframe, and we settled on something that would fit); I'll post here when I have that written up. At the time she posted the primer, tho, Veronica also included this:

I don’t think there is a lot that needs to change. Jake’s “find all pedigrees with at least 0 individuals with characteristic X” should be allowable (and he probably should not have to put in a dummy X to do this). Then the other change I would propose would be to allow the user to impose constraints on index cases within the ped ascertainment module itself, rather than having to do that separately and ahead of time in the cohort builder; and then to allow the user to separately impose constraints (or not, solving Jake’s problem in the process) based on characteristics of the relatives.

The first element (all pedigrees with 0 affecteds) is doable. The other two are an intriguging idea but would require a lot more development time, and so are being shelved at least for the initial public release. What I really want to do, though, is have some time to go out into the wilderness with this in mind and come back with a better overall design that fits the workflow - right now we can create something that technically lets you accomplish what we're after, but it doesn't really smoothly enable it. I want smooth enabling, if we can pull it off.

So that's what this issue is about. A general redesign of the UX of this module. I want to take the time to pin this down and Get It Right. And I absolutely want to do it for the Second Public Release - preferably, I'd have it for the initial, but I don't think we have the time to pull it off. :( But Veronica makes a good point about how that's a big part of the unique value proposition of DIVER:

[...] I view this as perhaps the single most important functionality for DIVER. One of the hallmarks of the NRGR collection is that it contains a very large number of families, often with extensive clinical info on multiple family members, and often selected for strong genetic patterns (e.g., for the presence of multliple individuals with SZ, or, multiple individuals not only with MDD but with early-onset MDD, etc.) The collection is unique in this regard – almost nobody collects these kinds of pedigrees anymore, and even when they do I doubt that comprehensive interviews like the DIGS get used for them. Pedigree analysis went out of vogue some, hmmm, maybe 15 years ago, but it is coming back around as people realize how much genetic information can be gleaned by tracing inheritance in closely related people (duh!). Anyway, for my money, this module is not just a pretty add-on, but really should be viewed as part of the core value-added functionality that DIVER has to offer the research community.

So yes, release for its own sake, but once I have breathing room I want to give this thing the love it deserves now that I actually know what it's supposed to be doing.

Viqsi commented 1 month ago

The actual Initial Public Release changes we're making are tracked with issue #261.

Viqsi commented 1 month ago

One distinction noticed while working on #261 - preview info currently only shows total number of affecteds per pedigree and sorts on that. While that makes perfect sense for "step 3", for "step 2" it might be advantageous to include total numer of folks in each pedigree as well.

I'm not sure how to address that immediately, though, and so am currently inclined to kick that can down the road for the time being unless I hear about it in feedback from others.

Viqsi commented 1 month ago

As a happy coincidence, for pedigree cohorts that have no constraining variable, the "number of affecteds" ends up being the same as the family size. So maybe in the UX I'll just rename the column (instead of my prior vague plan of hiding it as meaningless), and any effort to add that information to all previews can be decided on later as part of this issue.

MathematicalMedicine / diver-issues

Redesign Ascertain Pedigrees to better fit actual researcher workflow #260