atorus-research / metacore

https://atorus-research.github.io/metacore/
Other
33 stars 7 forks source link

Extend `metacore` object by adding programming-related information #99

Open kaz462 opened 10 months ago

kaz462 commented 10 months ago

To enhance the usage of metadata, what do you think of adding the following information to the metacore object?

  1. a column of functions used for derivation in metacore$var_spec and metacore$value_spec
  2. order of derivation column in metacore$var_spec and metacore$value_spec

Users can also add these two columns to the P21 Excel Spec (Variables, ValueLevel sheets), so that they can be read in through spec_to_metacore()

Thanks!

statasaurus commented 9 months ago

I really like this idea. The only thing I might change is to put the column of functions into the derivation table rather than the var_spec and value_spec so information isn't repeated. Also I would probably just put the derivation order in var_spec rather than var_spec and value_spec

TeMeta commented 6 months ago

Order of derivation can (and should) be dynamic and automated based on explicit dependencies

It would be good to figure out how to leverage/add to the existing MethodDef structure from CDISC ODMv2 and future Define versions so that it can evolve into a standard shared across the industry and potentially regulators too.

bundfussr commented 5 months ago

Order of derivation can (and should) be dynamic and automated based on explicit dependencies

@TeMeta , I'm not sure if this would work and if it is desirable. I think there are cases where the author of the specs need to or want to specify the order of the derivations.

Consider for example two derivations. One flags the last record per subject and the other one adds LOCF records. They don't depend on each other. I.e., with respect to the dependencies it doesn't matter in which order they are executed. But the results differ depending on the order. Thus the author needs to specify the order to avoid ambiguity.

For many derivations it doesn't matter in which order they are performed. For example CHG, PCHG, APERIOD, and TRTP can be derived in any order. But for readability of the specs and the code it is preferable to keep above order. Then the related ones are together. The order CHG, APERIOD, PCHG, and TRTP would be confusing in the specs and the code (although it would produce correct results).

Therefore I would use the dependencies to automatically generate an initial order of the derivations. Then it would be reviewed by the author and adjusted if necessary. Finally the adjusted order would be checked automatically whether any dependencies are violated.

TeMeta commented 5 months ago

Order of derivation can (and should) be dynamic and automated based on explicit dependencies

@TeMeta , I'm not sure if this would work and if it is desirable. I think there are cases where the author of the specs need to or want to specify the order of the derivations.

Consider for example two derivations. One flags the last record per subject and the other one adds LOCF records. They don't depend on each other. I.e., with respect to the dependencies it doesn't matter in which order they are executed. But the results differ depending on the order. Thus the author needs to specify the order to avoid ambiguity.

For many derivations it doesn't matter in which order they are performed. For example CHG, PCHG, APERIOD, and TRTP can be derived in any order. But for readability of the specs and the code it is preferable to keep above order. Then the related ones are together. The order CHG, APERIOD, PCHG, and TRTP would be confusing in the specs and the code (although it would produce correct results).

Therefore I would use the dependencies to automatically generate an initial order of the derivations. Then it would be reviewed by the author and adjusted if necessary. Finally the adjusted order would be checked automatically whether any dependencies are violated.

Hi @bundfussr, agreed on all points.

We do want to specify order manually sometimes too. Dependencies are just a very explicit and repeatable way of achieving this.

LOCF is a good example of a missing dependency that needs to be articulated. LOCF that operates on flags does depend on the flag. That dependency creates a dependency on the derivation of that flag, i.e. the derivation of flag must operate first. That dependency can either be predetermined (explicit dependency on the flag) or added post-hoc (hard-coding the order)

bundfussr commented 5 months ago

It would be good to figure out how to leverage/add to the existing MethodDef structure from CDISC ODMv2 and future Define versions so that it can evolve into a standard shared across the industry and potentially regulators too.

I agree that we need a standard for ADaM specs for making progress in automation. The current standards like define-xml or the Roche-internal format for ADaM specs are not intended for automation. Where suitable the new standard could use elements from existing standards like ODMv2, Define, and ARS.