Closed schuemie closed 4 years ago
Hi @schuemie I tested the package.
To help with testing I made changes to TestCode.R
file. During testing, I found a few issues (unrelated to this PR) - i have fixed some, added others to backlog.
Recommendation: I think results object should be returned as data-frame where possible. One of the changes you made (removing .flattenTree - returns a complex nested list of results. This is difficult to handle compared to data frame.) I think we should rework this part to incorporate a solution like .flattenTree that take such a complex nest and returns a more user friendly table format as described here
Merging #95 into develop will increase coverage by
0.34%
. The diff coverage is8.86%
.
@@ Coverage Diff @@
## develop #95 +/- ##
==========================================
+ Coverage 4.37% 4.72% +0.34%
==========================================
Files 8 8
Lines 914 826 -88
==========================================
- Hits 40 39 -1
+ Misses 874 787 -87
Impacted Files | Coverage Δ | |
---|---|---|
R/CohortCharacterization.R | 0.00% <0.00%> (ø) |
|
R/CohortDefinition.R | 0.00% <0.00%> (ø) |
|
R/Concept.R | 0.00% <0.00%> (ø) |
|
R/ConceptSet.R | 0.00% <0.00%> (ø) |
|
R/Estimation.R | 0.00% <0.00%> (ø) |
|
R/IncidenceRate.R | 0.00% <ø> (ø) |
|
R/PersonProfile.R | 0.00% <ø> (ø) |
|
R/WebApi.R | 25.82% <87.50%> (-0.49%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 3e49791...e651676. Read the comment docs.
Thanks @gowthamrao !
Like I said: the flattenTree code was just not working at all, throwing an error message. I tried to get it running, but couldn't figure it out.
I'm still on the fence on what to do with the objects pulled from WebAPI. In mind I make a distinction between 'assets' and 'status information':
An asset is something created in ATLAS, like a cohort definition, an estimation definition, etc., that has a specified structure. That structure is represented inside of WebAPI as a Java object. To transmit it, it is represented as a JSON object. In R we represent that same asset as an R object, still following the originally specified structure.
Status information is basically meta-information, like the status of a run, or the list of assets available.
The assets are things we often want to preserve, and keep in the same structure, so we can convert them back from R to JSON back to the original Java objects. Therefore, we shouldn't apply all sorts of transformations to these objects, like flatten a tree structure. We could add a function that we can point at such an object to produce a pretty representation, but we'd have to keep the original without modification.
Because status information is transient by definition, there's no need to preserve as is, and it can be transformed to some pretty format (preferably a tibble) right away.
The reasons we don't want to maintain both the JSON and R representation of the same asset are:
I guess the results objects are a bit in-between. Maybe they should just be converted to nicely formatted R objects, since we would want to upload results back to WebAPI anyway.
An asset is something created in ATLAS, like a cohort definition, an estimation definition, etc., that has a specified structure. That structure is represented inside of WebAPI as a Java object. To transmit it, it is represented as a JSON object. In R we represent that same asset as an R object, still following the originally specified structure.
I like this description a lot @schuemie . I don't think results
of an execution fit either of these categories. Example of results: get counts of persons who met each rule in cohort definition.
results
are not Assets
. The results are sourced by the WebApi from the underlying database that is storing the results. I don't think results
are 'created in Atlas`.results
are Status
either - because it is not transient meta information. results
are persisted in db.
Status information is basically meta-information, like the status of a run, or the list of assets available.
I think results
belong to their own category, and are the end product of the work. We could use assets
and 'status' to make sense of the results
. Since WebApi is just a transport medium from underlying db to end-user for results
- we don't need to preserve the structure of the results. We also don't need to post
results into WebApi. So - i don't think we
want to preserve, and keep in the same structure, so we can convert them back from R to JSON back to the original Java objects.
I propose, we deliver all results
in table format (i.e. dataframe). For results
we only maintain the R representation (of-coures, the db will have the source of truth and we wont modify those). Thus for results
we don't want to maintain both the JSON and R representation
Tried to make function naming more consistent. Deleted some functions that didn't adhere to new design. Re-implemented some functions using newer functions. Making sure all functions return tibbles instead of data frames. Fixed some functions that weren't working.