ImagingDataCommons / IDC-WebApp

Web Application front end for IDC (CORE REPO)
Apache License 2.0
6 stars 2 forks source link

Support 'AND' within an attribute filter (rather than just 'OR') #162

Closed fedorov closed 1 year ago

fedorov commented 3 years ago

As reported by @rkikinis:

image

G-White-ISB commented 3 years ago

The filter IS selecting for CT or MR, NOT CT AND MR. All the filter attributes work this way. This is consistent with ISB-CGC and the GDC portal.

fedorov commented 3 years ago

Yes, I understand. The point is that it WOULD be very useful to be able to configure the query to do AND.

rkikinis commented 3 years ago

Hi,

I strongly believe that we need that ability as a first class feature. Here are two examples: -I was looking for subjects that had both an MRI and a CT, approximately at the same time. -Similarly, PET CT produces a PET volume and a CT volume. In the modality listing there is CTPT, but that is only populated with a single data set. On the other hand, the modality PET lists 253. I would expect that the majority of those would be PET CT. Cohort selection with pet ct is a vital, critical path issue for an imaging archive.

Both of these use-case scenarios are middle of the road and not edge cases.

Best Ron P.S. I went to the qin_headneck collection and noticed, as expected that the study description contained the term PETCT. However, the only series listed is a ct series. See attached. I believe that on TCIA this collection has both a CT and a PET series.

On Jul 13, 2020, at 11:41 AM, G-White-ISB notifications@github.com wrote:

The filter IS selecting for CT or MR, NOT CT AND MR. All the filter attributes work this way. This is consistent with ISB-CGC and the GDC portal.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

— Ron Kikinis ron@kikinis.org

This email is intended for non-work related messages

rkikinis commented 3 years ago

Hi, I realized that some background information about PET CT might be useful. For an overview see this page on wikipedia: https://en.wikipedia.org/wiki/PET-CT The acquisition device is a CT scanner “grafted” to a PET scanner where the coordinate systems have been calibrated so that both volumes are in the same coordinate system. This type of imaging is deployed frequently in cancer imaging and often used for treatment monitoring with chemo therapy. The image below shows a display from one of the commercial viewers. The upper row is a time series of MIP’s (volume renderings with a maximum intensity transfer function) with pathology highlighted in color. In the lower row we see CT data in a b/w LUT overlaid with the PET using a red/white LUT. This is run of the mill image display for this type of data. Best Ron

On Jul 13, 2020, at 4:27 PM, Ron Kikinis ron@kikinis.org wrote:

Hi,

I strongly believe that we need that ability as a first class feature. Here are two examples: -I was looking for subjects that had both an MRI and a CT, approximately at the same time. -Similarly, PET CT produces a PET volume and a CT volume. In the modality listing there is CTPT, but that is only populated with a single data set. On the other hand, the modality PET lists 253. I would expect that the majority of those would be PET CT. Cohort selection with pet ct is a vital, critical path issue for an imaging archive.

Both of these use-case scenarios are middle of the road and not edge cases.

Best Ron P.S. I went to the qin_headneck collection and noticed, as expected that the study description contained the term PETCT. However, the only series listed is a ct series. See attached. I believe that on TCIA this collection has both a CT and a PET series.

On Jul 13, 2020, at 11:41 AM, G-White-ISB notifications@github.com wrote:

The filter IS selecting for CT or MR, NOT CT AND MR. All the filter attributes work this way. This is consistent with ISB-CGC and the GDC portal.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

— Ron Kikinis ron@kikinis.org

This email is intended for non-work related messages

<Screen Shot 2020-07-13 at 4.21.06 PM.png>

— Ron Kikinis ron@kikinis.org

This email is intended for non-work related messages

fedorov commented 3 years ago

I went to the qin_headneck collection and noticed, as expected that the study description contained the term PETCT. However, the only series listed is a ct series. See attached. I believe that on TCIA this collection has both a CT and a PET series.

@rkikinis we do not use StudyDescription for summarizing the data, since it is an uncontrolled free text.

But if I select qin_headneck only, the summary correctly shows PET and CT distribution (you can mouse over the pie chart sectors to confirm):

image

If you go over all series in the list, you will see both PT and CT are included.

G-White-ISB commented 3 years ago

Sorry I had misread the initial comment. I don't think this has come up before as other filter attributes are exclusive. Besides Modality is there any other attribute where and AND join would be strongly desired? We will need to discuss further.

rkikinis commented 3 years ago

That is not what users expect or want

Sent from my mobile phone. This email is for non-work related messages

On Jul 13, 2020, at 5:21 PM, Andrey Fedorov notifications@github.com wrote:

 I went to the qin_headneck collection and noticed, as expected that the study description contained the term PETCT. However, the only series listed is a ct series. See attached. I believe that on TCIA this collection has both a CT and a PET series.

@rkikinis we do not use StudyDescription for summarizing the data, since it is an uncontrolled free text.

But if I select qin_headneck only, the summary correctly shows PET and CT distribution (you can mouse over the pie chart sectors to confirm):

If you go over all series in the list, you will see both PT and CT are included.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

rkikinis commented 3 years ago

Hi Yes, let’s discuss. Pet Ct is a must unless there is another mechanism for achieving the same. MR ct is another example Best Ron

Sent from my mobile phone. This email is for non-work related messages

On Jul 13, 2020, at 5:57 PM, G-White-ISB notifications@github.com wrote:

 Sorry I had misread the initial comment. I don't think this has come up before as other filter attributes are exclusive. Besides Modality is there any other attribute where and AND join would be strongly desired? We will need to discuss further.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

fedorov commented 3 years ago

Let's discuss on Thursday.

Ron, I do not understand what "that" refers to in "That is not what users expect or want".

pieper commented 3 years ago

Here's another concrete example: for the neuronavigation research project I mentioned, the neurosurgeon requested a brain tumor case with a CT and MRI. I believe such cases exist in either the TCGA or CPTAC GBM collections.

Beyond that she would have liked some other query terms that may not exist currently in any table, for example she wanted that the CT was acquired pre-craniotomy. Probably being able to search by anatomical location within brain would also have been helpful as well as tumor type. These may be available in future CDA queries.

rkikinis commented 3 years ago

That behavior….

On Jul 14, 2020, at 7:57 AM, Andrey Fedorov notifications@github.com wrote:

Let's discuss on Thursday.

Ron, I do not understand what "that" refers to in "That is not what users expect or want".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

— Ron Kikinis ron@kikinis.org

This email is intended for non-work related messages

fedorov commented 3 years ago

Modality is for MVP, beyond Modality - post-MVP

fedorov commented 2 years ago

As noted by @dclunie, another use case is to select patients that have both radiology and slide microscopy data using Modality filter. This will also be applicable to BodyPartExamined, and perhaps other ones too.

s-paquette commented 2 years ago

@G-White-ISB The solr filter builder can now accept 2 types of filter set:

current version: { <attribute_name>: [val1, ...], ...}
new version: { <attribute_name>: { 'values': [<val1>, ...], 'op': <'AND'|'OR'>, ...}

In the later format numeric operators and attributes still work the same (eg. name it SliceThickness_btw and the values are [0,4]). The 'op' value on those can be null, it will be ignored. You can also mix this format, if you want:

{
  'Modality': { 'values': ['CT', 'PET'], 'op': 'AND',
  'SliceThickness_btw': [0,5]
}

vs.

{
  'Modality': { 'values': ['CT', 'PET'], 'op': 'AND',
  'SliceThickness_btw': 'values': [0,5], 'op': null
}

The next step is to add the and vs. or checkbox or radio button to the Modality attribute section in the exploration page, then construct and pass the filter set in this new way any time this is used.

G-White-ISB commented 2 years ago

Can cohorts be saved/retrieved specifying an 'op' for a specific filter?

Am I correct that translating a filter with an 'and' operator to a BigQuery string is difficult or impossible?

s-paquette commented 2 years ago

@G-White-ISB

Can cohorts be saved/retrieved specifying an 'op' for a specific filter?

Yes. The operator defaults to BETWEEN for continuous numeric attributes, OR for anything else. To set the operator at cohort creation time, you provide the filter(s) as:

{ : { 'values': [a, b, ...], 'op': 'AND', : {'values': [5], 'op': 'GTE'} }

This results in a subquery to get the Studies which contain both values. plus the numeric:

WHERE StudyInstanceUID IN  (
    SELECT StudyInstanceUID 
    FROM table 
    WHERE x = 'a' 
    INTERSECT 
    SELECT StudyInstanceUID 
    FROM table 
    WHERE  x = 'b'
) AND y >= 5

This is much simpler in Solr, where the fields are multivalue and thus the engine understands the concept of a field which has all of a set of values: +x:("a" && "b") +y:{* TO 5}

If you set 'op' to None or null, or don't provide it, cohort savings falls back to the defaults. Defaults are also used in the old version of filters:

{ : [a, b, ...], : [5, 10] }

This would result in WHERE x IN (a, b) AND y BETWEEN 5 AND 10

Am I correct that translating a filter with an 'and' operator to a BigQuery string is difficult or impossible?

Not impossible, just complicated. You have to do an intersect, since each field only has one value, and SQL code at its core doesn't inherently intersect these for you.

There's one complex corner case I'm working on: continuous numerics where the user has also asked for 'None'. That's a little tricky since now, we have TWO operators in play: the OR between y IS NULL OR y >= 5, and the >= part. I think we'll have to finally begin using Filter Grouping for this, so each atomic member of the filter has its own operator. The way to think of this is:

Cohort: contains 1 or more Filter Groups, operators to combine them Filter Group: contains Filters, operator to combine them Filter: contains a SINGLE operator for combining a set of values

If you need multiple operators on an attribute (like my numeric example there which needs both OR as well as >= ) then you need to break those filters up between Filter Groups.

This isn't something you need to worry about specifically as the backend should be managing it for you entirely. If you send the filters in the formats shown above, they should come back as expected. I'm still working out the finer details, though.

G-White-ISB commented 2 years ago

So, in the sprint-gw-35 branch I have front end code that adds the AND/OR option to modality AND sends requests using the new filter format. However it did not seem to be working. I seemed to be getting the same numbers back wether I group by AND or OR.

Also it seemed that the cohort model needed some updating to return the operator in the cohort_details view. There is a commit in the sprint-gw-35 branch of IDC-Common for that.

I did not create pull PRs as I was not sure where you were on things and I did not resolve if there was a back end issue. But I checked that the filter operator was being sent with the request as per the syntax above. At this point you can take this over or wait until I'm back.

s-paquette commented 2 years ago

@G-White-ISB For the moment AND/OR doesn't work in deployment because I'm not done working on the back-end code. If you could put in a PR, I'll be able to sort out when to merge it in, and in the mean time can pull from your branch to finish off my part of it. Your Common code might collide with mine because that's one of the as-yet-merged bits of code I need to finish up.

s-paquette commented 2 years ago

Done up to this point: option to select AND/OR for Modality; saving AND/OR to a Cohort; Cohort/filter URL; Cohort display when AND is chosen; error checking for user trying to cheat the URL to use AND elsewhere (unsupported on the front end for the moment)

Working on: BQ string display, BQ query execution; this requires any 'AND' attributes to be intersection sets to get the right set of Series/Study.

s-paquette commented 2 years ago

UI adjustments made, no effect on the back-end, just to make the use more clear.

s-paquette commented 2 years ago

@fedorov @pgundluru First pass at this functionality is now on dev. For the moment the AND operator is only available on Modality; let us know if any other attributes are applicable and George and I can look into that. So far this...

There may be an issue with cohorts which have some of the TCGA numeric clinical attributes, so steer clear of those for now while I continue to check them over.

@fedorov We guessed at the language surrounding the AND/OR button; please let us know if it should appear different. It currently mimics the TCIA text but we can go with something else.

fedorov commented 2 years ago

We guessed at the language surrounding the AND/OR button; please let us know if it should appear different.

It's fine for now, can consider refining later based on feedback from everyone.

s-paquette commented 1 year ago

Note that for the moment this can ONLY be applied to Modality. A user could in theory craft a URL and attempt to by-pass that and it should work, but the UI itself provides this option only on a single attribute.

I still need to see if some of the numerics will run into problems; if so, I may need to write a shim for them temporarily.

pgundluru commented 1 year ago

Tested this in test tier for filter Modality:

Test 1 : Select Body part Breast and filter modality for ANY (ct and pet)

should produce a BQ query which matches the results of the UI search - Pass
should allow saving of the cohort with the AND operation on modality - Pass 
should produce the proper manifests for download or export - Pass

Test 2 : Select Body part Bladder, extremity and filter modality for ALL (magnetic resonance and pet)

should produce a BQ query which matches the results of the UI search - Pass
should allow saving of the cohort with the AND operation on modality - Pass 
should produce the proper manifests for download or export - Pass
pgundluru commented 1 year ago

Test 3 : Select Body part abdomen and filter modality for ANY (seg and rtstruct)

should produce a BQ query which matches the results of the UI search - Pass should allow saving of the cohort with the AND operation on modality - Pass should produce the proper manifests for download or export - Pass

Test 4 : Select Body part Abdomen and filter modality for ALL (seg and rtstruct)

should produce a BQ query which matches the results of the UI search - Pass should allow saving of the cohort with the AND operation on modality - Pass should produce the proper manifests for download or export - Pass