Incorrect cohort summary plots

fedorov commented 4 years ago

See issue described in https://discourse.canceridc.dev/t/understanding-idc-portal-cohort-plots/81.

s-paquette commented 4 years ago

I want to note that this isn't really a bug--this is case centric counting. So if we want the counts to function in some other manner--Study centric, for example--then we need to do things differently. But this is the code behaving as intended, in terms of counting cases.

s-paquette commented 4 years ago

I think I wound up mentioning all of this in a different ticket, but here are the things we need to answer before we can change this:

What are the numbers we want counted--cases, or studies?
What are we aggregating those numbers based on--studies filtered using the filters, or cases?
- If we count by case, then selecting eg. a modality indicates you want the case counts for all cases with any instance that has a given modality. This means all the other case counts will be for instances which are from those cases. (This is how it currently works.)
- If we count by study, then selecting eg. a modality indicates you want the case counts for all studies which have instances that have the given modality. This means all the other case counts will be for instances which are from those studies.

Is the later how we want to count? (Or, do we want to count in some other way?)

fedorov commented 4 years ago

Ok, I thought that through the discussion on the forum you confirmed that this is a bug:

Sorry I misread–I think I understand what you’re saying. And to be clear, it’s cases. We don’t count studies.

…ah ha. This is due to how we wanted to make Original data ‘child record case search’ against derived. It’s also applying it to other Original data records. Since I’m working on counting related separately (same code) I’ll see if I can sort out a way to only go between orginal/derived and not original/original.

If this is not a bug, can you please explain how the current interface counts cases?

The issue is what I described in the forum, I did not see an explanation of how this could be a correct behavior.

G-White-ISB commented 4 years ago

Yeah I haven't followed the discussion well enough. Let me try an example. If we have 3 cases with CT modality only, 4 cases with PT only, and 2 cases with CT and PT modalities then filtering with 'MODALITY=CT' will produce a chart showing only CT=5 and no other modalities. Andrey you would expect the pie chart to show 5 cases with CT modality and 2 with PT modality? I guess a pie chart is confusing since the sum of the number of cases shown for each modality could be greater than the total number of cases. This is coming up because we are modeling after gdc and isb-cgc and in those portals I think all the attributes are exclusive - you would not have a case matching more than one.

fedorov commented 4 years ago

@G-White-ISB thanks for the response! I was struggling with this this morning ...

This is coming up because we are modeling after gdc and isb-cgc and in those portals I think all the attributes are exclusive - you would not have a case matching more than one.

Right, that's probably the key, since for our data attributes are most definitely NOT exclusive, and perhaps you nailed the source of the troubles we've been having.

If we have 3 cases with CT modality only, 4 cases with PT only, and 2 cases with CT and PT modalities then filtering with 'MODALITY=CT' will produce a chart showing only CT=5 and no other modalities. Andrey you would expect the pie chart to show 5 cases with CT modality and 2 with PT modality?

Yes, I think that would be the logical behavior which I would be able to explain. We define a cohort based on the presence of CT modality for a given case, and then charts summarize the attributes along the dimensions we define for the selected cohort. So yes, for your example I would expect a chart showing 5 CT and 2 PT as modalities, summing up to more than 5. Explanation of such behavior would be the following:

"Show counts of cases for each of the modalities present in the cohort".

If we assume that the current behavior for the Modality is correct and should show 5 cases, then what is the explanation of what that Modality piechart shows? Also, what are we expecting the other plots to show?

s-paquette commented 4 years ago

Yes, lack of attribute exclusivity is the main issue we're going to have with counts, in general. This mostly means we need to think carefully about how we present them. Per case or per study (or even per series) will have a large impact on the counts and thus the charts.

Per today's call, we can leave these as-is for now, but it's worth thinking about how we'll want the counts to come out in the long haul. Do we want results to represent cases? Studies? User-decided?

fedorov commented 4 years ago

Please comment in https://discourse.canceridc.dev/t/understanding-idc-portal-cohort-plots/81/7?u=fedorov on the accuracy of the definition as I understand it.

wlongabaugh commented 4 years ago

If a pie chart was to show 5 and 2, then we are showing studies, since pie charts are by definition showing features that partition a set (i.e. are mutually exclusive and cover the set). If we are graphically showing non-exclusive attributes, then the appropriate visualization is an Euler diagram (Venn diagrams with no empty intersections shown).

G-White-ISB commented 4 years ago

So I'm not sure what you mean by 5 and 2 above. We actually are not returning the right data to construct Euler diagrams - we don't return the sizes of intersections between attributes. George

On Fri, Sep 4, 2020 at 12:16 PM wlongabaugh notifications@github.com wrote:

If a pie chart was to show 5 and 2, then we are showing studies, since pie charts are by definition showing features that partition a set (i.e. are mutually exclusive and cover the set). If we are graphically showing non-exclusive attributes, then the appropriate visualization is an Euler diagram (Venn diagrams with no empty intersections shown).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ImagingDataCommons/IDC-WebApp/issues/240#issuecomment-687332816, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN54OR2735BIOYGDL7L7ZATSEE4I3ANCNFSM4QNTYAIQ .

G-White-ISB commented 4 years ago

'So I'm not sure what you mean by 5 and 2 above. ' - Sorry thats above of course.

wlongabaugh commented 4 years ago

Yes, we would need intersection counts to depict correctly. For this simple example, the selection of a filter picks a circle out of the Venn diagram, and all cases within the circle are chosen. If you pick another filter, another circle is selected, and all cases in that circle are added to the cohort. VennFilter

wlongabaugh commented 4 years ago

So a bar chart that broke out all the unique set membership combinations could do the same thing, and is highly scalable.

In "OR" mode, clicking on filter "CT" will cause all bars that contain CT somewhere in a label to light up. If the filters were in "AND" mode, only bars with all selected labels will light up.

wlongabaugh commented 4 years ago

BioFabric Plot as an Euler Diagram. Use bar height instead of width for cardinality of subset, use the Gray code ordering of the set intersections as shown here (only one element changes between adjacent subsets), and away you go. BioFabEuler

wlongabaugh commented 4 years ago

We might need a log scale on the bar chart to handle the large dynamic range between tiny intersections and large size of one-modality-only cases.

wlongabaugh commented 4 years ago

One question is what does it mean to click on a bar labeled as a set intersection? Toggle all the filters tied to that set intersection...or turn them off only if on? Open question.

wlongabaugh commented 4 years ago

So say we have four modalities, and there are cases with each combination of the four. Here is what I will dub an EulerBar depiction, with modality MR selected. Note that some modalities are not contiguous regions; in this plot, CT and MR are contiguous, the other two are not. Like a truth table, S has two separate bands, and PT has four separate bands.

Since it is an EulerBar and not a VennBar, the intersections that are null sets can be dropped from the chart. It should be a lot simpler in practice, but it can handle any complexity you throw at it.

wlongabaugh commented 4 years ago

If we wanted to be visually explicit about the existence or absence of overlapping attributes in the filter set, we could switch between pie charts and EulerBars depending on whether all set intersections were null or not.

wlongabaugh commented 4 years ago

Of course, if we are creating pie slices for all the non-null combination subsets and labeling them as such, we can go back to using a pie chart, since those partition the set.

G-White-ISB commented 3 years ago

Testing needed (dev.090920201037.46fd5a). None attribute filters are disabled in the 'derived' tab with a tooltip trying to explain why. I am open to suggests on the tooltip text. I realize now the tooltip should activate when user is hovering over the word None and not just the checkbox.

madelyngreyes commented 3 years ago

I agree with George the tool-tip should activate over the text as well.

The text looks good to me.

Does this work for you as well by chance? @fedorov

fedorov commented 3 years ago

Yes, the current behavior is fine with me. Thanks.

ImagingDataCommons / IDC-WebApp

Incorrect cohort summary plots #240