OHDSI / FeatureExtraction

An R package for generating features (covariates) for a cohort using data in the Common Data Model.
http://ohdsi.github.io/FeatureExtraction/
61 stars 60 forks source link

Support for bigint in cohortId - need for phenotype library #102

Closed gowthamrao closed 4 years ago

gowthamrao commented 4 years ago

https://github.com/OHDSI/FeatureExtraction/blob/ce76ce62839051012db3393805c639adfd375519/R/GetDefaultCovariates.R#L63

cohortId is cast to integer here, but cohortId may be bigint

as.integer(cohortId)

This causes the following error

Warning: NAs introduced by coercion to integer range

To overcome this, cohortId should be BIGINT. R does not support BIGINT (without extensions). DOUBLE maybe used. But cohortId is an ID, and we wont be doing any math/calculations on it - so cohortId maybe used a character within R.

So a potential solution is to use

as.character(cohortId)

But this causes Java error

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.NoSuchMethodException: No suitable method for the given parameters

@schuemie i think the fix is somewhere in the Java - which is too deep for me to review. Could you please

gowthamrao commented 4 years ago

Found during testing https://github.com/OHDSI/FeatureExtraction/issues/95

gowthamrao commented 4 years ago

Related to https://github.com/OHDSI/CohortDiagnostics/issues/218

schuemie commented 4 years ago

For now, we use numeric to represent large integers. The downside is that we can only represent numbers < 2^53 without losing information.

The plan for the future is to use the integer64 type in the bit64 package to represent 64-bit integers. The first step is to start using this in DatabaseConnector. This is on the roadmap, but we haven't had time to implement that yet.

gowthamrao commented 4 years ago

Thanks @schuemie

The use case in phenotype library is that cohortIds are being defined as (conceptId*1000) + (1-999), which should all be < 2 ^ 53 (similar to covariateIds).

Tagging @chrisknoll who is working on adding support for 64-bit integers to DatabaseConnector for awareness.