ResearchSoftwareInstitute / greendatatranslator

Green Team Data Translator Software Engineering and Development
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Develop Swagger UI for ICEES Smart API #134

Open karafecho opened 6 years ago

karafecho commented 6 years ago

The purpose of this issue is to develop a user-friendly UI to interface with the Smart API for the DDCR Service. This will allow SMEs and other users to access the DDCR Service through drop-down menus and other features, without having to execute curl commands.

karafecho commented 6 years ago

Swagger UI can be found at: https://ddcr.renci.org/apidocs/

stevencox commented 6 years ago

Looks good to me. Thanks.

xu-hao commented 6 years ago

Based on today's meeting with @karafecho

karafecho commented 5 years ago

@xu-hao : Please see the requests below.

  1. The text/tabular view for the 'get feature association' function needs to be reformatted similar to the 'get associations to all features' function.

  2. Please include the following links at the top of the API, under 'ICEES API documentation':

patient-level tables, column headers and binning strategy,

visit-level tables, column headers and binning strategy

ICEES reference slide deck

CORRECTION: Instead of links to individual files, please add a link to the ICEES folder.

  1. Reorder functionalities as follows: (1) 'get features', (2) 'get feature association', (3) 'get associations to all features'

  2. For consistency, please make the following word changes to the description of each functionality:

a. Replace 'Create a new cohort by a set of feature variables and set the cohort id. Even if a cohort has already been created for this set before, a new cohort is created.' with 'Cohort discovery. Users define a cohort using any number of defined feature variables as input parameters, and the service returns a sample size. A new cohort is created even if a cohort was previously created using the same input parameters.'

b. Replace 'get features' with 'Feature-rich cohort discovery: users select a predefined cohort as the input parameter, and the service returns a profile of that cohort in terms of all feature variables.

c. Replace 'get feature association' with 'Hypothesis-driven 2 x 2 feature associations: users select a predefined cohort and two feature variables, and the service returns a 2 x 2 feature table with a corresponding Chi Square statistic and P value.'

d. Replace 'get associations to all features' with 'Exploratory 1 X N feature associations: users select a predefined cohort and a feature variable of interest, and the service returns a 1 x N feature table with corrected Chi Square statistics and associated P values.'

  1. Is it possible to move the 'response content type' to the middle column? It's easy to overlook in the current position.

  2. Can we include documentation or a dictionary for versioning of tables?

If you'd prefer that I complete the above tasks, just let me know.

@xu-hao : Note that I made a correction to item 2.

@xu-hao : In terms of prioritization, I'd consider items 2, 3, and 4 plus the acknowledgement text below to be the most important.

karafecho commented 5 years ago

@xu-hao : One more thing: please add the text below beneath the text for the terms and conditions of the service. Thanks!

"We kindly request that Translator team members provide proper attribution for any products (e.g., manuscripts, podium presentations, software) derived from work related to Green Team's clinical datasets. Attribution should include acknowledgement of the funder (National Center for Advancing Translational Sciences [NCATS], Biomedical Data Translator Program awards, OT3TR002020 and OT2TR002514), the North Carolina Translational and Clinical Sciences (NC TraCS) Institute (NCATS, Center for Translational Science Award, UL1TR002489), UNC Hospitals and Health Care System, and all Green Team members who contributed to the work."

@xu-hao : I added an additional grant number to the text above. @xu-hao @ColinKCurtis : As discussed today, the addition of the above text is high priority and should be completed prior to the hackathon. Thanks!

karafecho commented 5 years ago

@xu-hao : I believe you used panda.qcut for binning. I'm working with the data in SAS JMP and finding that the quintiles do not appear to align with yours. (See screenshots below for example distributions.) I think you binned by 'values' whereas I binned by 'population distribution'. I'm not sure which approach is best, but I do want to better understand the binning strategy that you used.

image

xu-hao commented 5 years ago

@karafecho It seems that the data I get from qcut matches the data on your screenshot:

For example, for EstResidentialDensity:

Categories (5, interval[float64]): [(-0.001, 1250.0] < (1250.0, 1688.0] < (1688.0, 2204.0] <
                                    (2204.0, 2662.0] < (2662.0, 9653.0]]

matches the Quantiles tab in your screenshot

count    23093.000000
mean      2124.561815
std       1146.372076
min          0.000000
25%       1387.000000
50%       1987.000000
75%       2545.000000
max       9653.000000

matches the Summary Statistics tab in your screenshot

Numbers of samples in each category is not shown directly on the screenshot, so I cannot compare with the result of pandas. If there is a mismatch, one possible cause of mismatch could be that SAS JMP and pandas handles values on the boundary differently. If you look at the output of pandas, all the categories are left exclusive and right inclusive. If SAS JMP has a different algorithm, then it will give different numbers.

karafecho commented 5 years ago

Thanks, Hao. I'll look into this issue.

Note that I changed the links in item 2 above and added a second grant number and revised text to the proposed citation.

karafecho commented 5 years ago

For 'get definition of a cohort': I think we should return the sample size with the features that define a cohort.

xu-hao commented 5 years ago
xu-hao commented 5 years ago

@ColinKCurtis : Could you please take at look at the following feature request?

  1. Reorder functionalities as follows: (1) 'get features', (2) 'get feature association', (3) 'get associations to all features'
xu-hao commented 5 years ago

@karafecho

  1. The text/tabular view for the 'get feature association' function needs to be reformatted similar to the 'get associations to all features' function.

It is already formatted similar to 'get associations to all features'.

+-----------------+-----------------------+------------------------+---------+
| feature         | AgeStudyStart = 0-2   | AgeStudyStart <> 0-2   |         |
+=================+=======================+========================+=========+
| ObesityBMI = 0  | 2013     8.72%        | 21076   91.28%         | 23089   |
|                 | 100.00%  8.72%        | 99.98%  91.27%         | 99.98%  |
+-----------------+-----------------------+------------------------+---------+
| ObesityBMI <> 0 | 0      0.00%          | 4      100.00%         | 4       |
|                 | 0.00%  0.00%          | 0.02%  0.02%           | 0.02%   |
+-----------------+-----------------------+------------------------+---------+
|                 | 2013                  | 21080                  | 23093   |
|                 | 8.72%                 | 91.28%                 | 100.00% |
+-----------------+-----------------------+------------------------+---------+
+-----------+---------------+
|   p_value |   chi_squared |
+===========+===============+
|  0.943927 |       0.38204 |
+-----------+---------------+

Is there anything specific you want me to change?

xu-hao commented 5 years ago

@ColinKCurtis I got 3 working. Had to patch Flasgger. So help is no longer needed for this feature request.

karafecho commented 5 years ago

Thanks, Hao. Sorry for the confusion.

karafecho commented 5 years ago

@xu-hao @ColinKCurtis : Can we add back a link to Hao's GitHub site above or below the link to my general-user documentation?

karafecho commented 5 years ago

FYI:

This query (see curl in second screenshot): image

Yielded this result: image

Could be an error on my part. Not sure, however.

colinkcurtis commented 5 years ago

@xu-hao @ColinKCurtis : Can we add back a link to Hao's GitHub site above or below the link to my general-user documentation?

I am looking at this now!

xu-hao commented 5 years ago

@karafecho i fixed the bug. thanks for finding it.

karafecho commented 5 years ago

@xu-hao @ColinKCurtis : I am wondering what the best approach is for pointing to plain-text documentation on ICEES. The information currently is available through a hyperlink from the UI to a Google folder, but I'm not sure this is the best approach. Thoughts?

xu-hao commented 5 years ago

@karafecho Simplest is just check them into github. Or we can check it into github and use github pages. Or we can use readthedocs. Given the current amount of documentation probably checking them into github is sufficient.

karafecho commented 5 years ago

@xu-hao @ColinKCurtis : I don't think we need to include any statistics (Chi Square, P values) for functionality two: feature-rich cohort discovery.

xu-hao commented 5 years ago

ok, i'll remove those stats.

xu-hao commented 5 years ago

it's done.

karafecho commented 5 years ago

@xu-hao @ColinKCurtis : Functionality 2, feature-rich cohort discovery, is not working for the visit-level tables, apparently because those tables do not contain the feature variable "TotalEDInpatientVisits".

image image

image image

karafecho commented 5 years ago

Actually, I think something generally is wrong with the visit-level tables.

image

image

image

image

xu-hao commented 5 years ago

I'll take a look at this today.

xu-hao commented 5 years ago

@karafecho i fixed a bug that is causing some of the errors. other errors can be resolved by changing the table from patient to visit.

karafecho commented 5 years ago

Oops! Well, that was a stupid mistake.

This works fine now. Thanks, Hao.

karafecho commented 5 years ago

I'm having some issues with the ICEES API, specifically, functionality 4. The service isn't returning all feature associations. For instance, AvgDailyPM2.5Exposure x TotalEDInpatientVisits isn't returned for year 2011 patient-level queries. I checked the underlying tables, however, and the data are definitely there. Plus, I can generate the output using functionality 3 for 2 x 2 tables.

image image

karafecho commented 5 years ago

I'm also having trouble generating results for functionality 4, year 2010, patient-level queries (see example below). This started today; I did not have trouble on Friday, except for the missing feature associations using functionality 4.

image image

xu-hao commented 5 years ago

The reason "AvgDailyPM2.5Exposure x TotalEDInpatientVisits" is not showing is because the maximum_p_value is too low. If you increase it to 1, it will show up.

karafecho commented 5 years ago

Ahh! That makes perfect sense. I was confused because SAS JMP does not filter results, and I've been comparing output from the API with output from SAS JMP. I really need to keep these caveats in mind!

karafecho commented 5 years ago

@xu-hao : Okay, I'm confused. COHORT:22 was originally defined as {} (all patients), 2010, patient table, v 1.0.0. I know this because I used that cohort to generate the data in the ICEES MS. Plus, Colin used that cohort in the Jupyter notebook. But now it is defined as follows:

{ "return value": [ { "size": 3958, "cohort_id": "COHORT:22", "features": { "TotalEDInpatientVisits": { "operator": "<", "value": 2 }, "MaxDailyPM2.5Exposure": { "operator": ">", "value": 2 } } },

I created COHORT:56 to create a new cohort defined the same as COHORT:22 originally was, but I'm concerned because COHORT:22 is referenced in many places now. Is it possible to delete it and then redefine it? If so, we can delete COHORT:56.

xu-hao commented 5 years ago

Yes, you can use the PUT operator on the Flasgger UI to override cohort definition. I've already restore COHORT:22 you can try it again.

On Tue, Oct 30, 2018 at 1:11 PM karafecho notifications@github.com wrote:

@xu-hao https://github.com/xu-hao : Okay, I'm confused. COHORT:22 was originally defined as {} (all patients), 2010, patient table, v 1.0.0. I know this because I used that cohort to generate the data in the ICEES MS. Plus, Colin used that cohort in the Jupyter notebook https://github.com/ncats/translator-workflows/blob/master/greengamma/general/ICEES_API_Notebook.ipynb. But now it is defined as follows:

{ "return value": [ { "size": 3958, "cohort_id": "COHORT:22", "features": { "TotalEDInpatientVisits": { "operator": "<", "value": 2 }, "MaxDailyPM2.5Exposure": { "operator": ">", "value": 2 } } },

I created COHORT:56 to create a new cohort defined the same as COHORT:22 originally was, but I'm concerned because COHORT:22 is referenced in many places now. Is it possible to delete it and then redefine it? If so, we can delete COHORT:56.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ResearchSoftwareInstitute/greendatatranslator/issues/134#issuecomment-434387330, or mute the thread https://github.com/notifications/unsubscribe-auth/ABtRbRGNUchlD46970ognVr509GDad4rks5uqIgqgaJpZM4UXbZG .

karafecho commented 5 years ago

Excellent! Thanks, Hao. And yes, I totally used the wrong operator. Sorry!

karafecho commented 5 years ago

Thanks for removing the 2011 tables from the ICEES API, Hao. As I mentioned, I think we need to refine the binning strategy for the CMAQ exposures in those tables before we openly release them.

xu-hao commented 5 years ago

once i get the FHIR data from James, i can try one of the neural network based approach. what do you think?

On Mon, Nov 5, 2018 at 5:27 PM karafecho notifications@github.com wrote:

Thanks for removing the 2011 tables, Hao. As I mentioned, I think we need to refine the binning strategy for the CMAQ exposures in those tables.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ResearchSoftwareInstitute/greendatatranslator/issues/134#issuecomment-436059063, or mute the thread https://github.com/notifications/unsubscribe-auth/ABtRbbDXTy7iiqtmzgvGsY6078I7b4Ipks5usLs3gaJpZM4UXbZG .

karafecho commented 5 years ago

Sure, but let's chat first. I'm finding race-based differences in the PM2.5 distribution that are interesting and may have been masked had we optimized binning.

karafecho commented 5 years ago

@xu-hao @ColinKCurtis : WRT binning, I'm convinced that we will never identify a one-strategy-fits-all approach. Instead, I'm wondering if we should:

  1. Optimize binning by precalculating tables using multiple binning strategies and incorporating a feature in the API/UI to allow users to select their preferred binning strategy.
  2. Optimize binning by incorporating a feature in the API/UI to allow users to collapse contiguous bins. For example, this feature would allow users to select the preferred number of bins for TotalEDInpatientVisits, MajorRoadwayHighwayExposure, etc.
xu-hao commented 5 years ago

Both are doable.

On Mon, Nov 12, 2018 at 1:38 PM karafecho notifications@github.com wrote:

@xu-hao https://github.com/xu-hao @ColinKCurtis https://github.com/ColinKCurtis : WRT binning, I'm convinced that we will never identify a one-strategy-fits-all approach. Instead, I'm wondering if we should:

  1. Optimize binning by precalculating tables using multiple binning strategies and incorporating a feature in the API/UI to allow users to select their preferred binning strategy.
  2. Optimize binning by incorporating a feature in the API/UI to allow users to collapse contiguous bins. For example, this feature would allow users to select the preferred number of bins for TotalEDInpatientVisits, MajorRoadwayHighwayExposure, etc.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ResearchSoftwareInstitute/greendatatranslator/issues/134#issuecomment-437971570, or mute the thread https://github.com/notifications/unsubscribe-auth/ABtRbRjNgxVORvEXTq0xKRtRUzmGNpwFks5uubUCgaJpZM4UXbZG .

karafecho commented 5 years ago

Great! Let's plan on this, as we discussed earlier today.

karafecho commented 5 years ago

Hao modified the API to include the binning optimization strategy (2) on 11/19/18. To be deployed.

karafecho commented 5 years ago

Steve Appold (Kenan-Flagler) sent a new data pull for the nationwide ACS data. The estimates are similar, but not identical, to the estimates that he sent previously (he used a different source file). He did not include the variable "EstProbabilityNonHispWhite" (not sure why), although he did include standard error estimates. These will be used by FHIR PIT to generate the new integrated feature tables.

karafecho commented 5 years ago

11/20/18: Hao deployed the new API. New operation is temporarily titled "feature_association2".

Example operation:

{ "feature_a":{ "AgeStudyStart":[ { "operator":"=", "value":"0-2" }, { "operator":"between", "value_a":"3-17", "value_b":"18-34" }, { "operator":"in", "values":["35-50","51-69"] },{ "operator":"=", "value":"70+" } ] }, "feature_b":{ "ObesityBMI":[ { "operator":"=", "value":0 }, { "operator":"<>", "value":0 } ] } }

karafecho commented 5 years ago

@xu-hao : I cannot get the new functionality to run. Is there something wrong with my syntax?

{{ "feature_a":{ "AvgDailyPM2.5Exposure":[ { "operator":"in", "values":[1,2] }, { "operator":"=", "value":3, }, { "operator":"in", "values":[4,5], }} ] }, {"feature_b":{ "TotalEDInpatientVisits":[ { "operator":"<", "value":2 }, ] } }}

karafecho commented 5 years ago

@xu-hao : When you get a chance, please replace paragraph three of the current DUA-like boilerplate with the following text:

"We kindly request that users of this service provide proper attribution for any products (e.g., manuscripts, podium presentations, software) derived from work related to ICEES. Attribution should include acknowledgement of the funder (National Center for Advancing Translational Sciences, National Institutes of Health, Biomedical Data Translator Program award [OT3TR002020 and OT2TR002514] and Center for Translational Science award [UL1TR002489], UNC Hospitals and Health Care System, and all team members who contributed to the work."

xu-hao commented 5 years ago

It's update.


From: karafecho notifications@github.com Sent: Friday, February 8, 2019 4:30 PM To: ResearchSoftwareInstitute/greendatatranslator Cc: Xu, Hao; Mention Subject: Re: [ResearchSoftwareInstitute/greendatatranslator] Develop Swagger UI for ICEES Smart API (#134)

@xu-haohttps://github.com/xu-hao : When you get a chance, please replace paragraph three of the current DUA-like boilerplate with the following text:

"We kindly request that users of this service provide proper attribution for any products (e.g., manuscripts, podium presentations, software) derived from work related to ICEES. Attribution should include acknowledgement of the funder (National Center for Advancing Translational Sciences, National Institutes of Health, Biomedical Data Translator Program award [OT3TR002020 and OT2TR002514] and Center for Translational Science award [UL1TR002489], UNC Hospitals and Health Care System, and all team members who contributed to the work."

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ResearchSoftwareInstitute/greendatatranslator/issues/134#issuecomment-461953356, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABtRbbykj2iY_CJNCI3hfo_q8adoYhN-ks5vLexigaJpZM4UXbZG.

karafecho commented 5 years ago

Thanks, but I modified the wording slightly:

"We kindly request that users of this service provide proper attribution for any products (e.g., manuscripts, podium presentations, software) derived from work related to ICEES. Attribution should include acknowledgement of the funder (National Institutes of Health, National Center for Advancing Translational Sciences, Biomedical Data Translator Program [awards OT3TR002020 and OT2TR002514] and Center for Translational Science Program [award UL1TR002489]), UNC Hospitals and Health Care System, and all team members who contributed to the work."

Sorry!

xu-hao commented 5 years ago

It's updated.


From: karafecho notifications@github.com Sent: Friday, February 8, 2019 5:51 PM To: ResearchSoftwareInstitute/greendatatranslator Cc: Xu, Hao; Mention Subject: Re: [ResearchSoftwareInstitute/greendatatranslator] Develop Swagger UI for ICEES Smart API (#134)

Thanks, but I modified the wording slightly:

"We kindly request that users of this service provide proper attribution for any products (e.g., manuscripts, podium presentations, software) derived from work related to ICEES. Attribution should include acknowledgement of the funder (National Institutes of Health, National Center for Advancing Translational Sciences, Biomedical Data Translator Program [awards OT3TR002020 and OT2TR002514] and Center for Translational Science Program [award UL1TR002489]), UNC Hospitals and Health Care System, and all team members who contributed to the work."

Sorry!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ResearchSoftwareInstitute/greendatatranslator/issues/134#issuecomment-461974296, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABtRbbyOO35Nks2YyQmqHsQ6sa95bqmmks5vLf9rgaJpZM4UXbZG.

karafecho commented 5 years ago

Excellent! Thanks, Hao. Kara