cdisc-org / ddf-core-poc

This repository will contain the results from the Proof of Concept project.
MIT License
0 stars 1 forks source link

Test data template updates #169

Closed ASL-rmarshall closed 3 months ago

ASL-rmarshall commented 3 months ago

Changes applied to create_template.py:

Removed generically named template file and replaced with version-specific versions for USDM v2.6 (previous design) and v2.11 (including all updates listed above)..

ASL-rmarshall commented 3 months ago

Hi @BSnoeijerCD - I've asked for your review for this pull request. I'm not expecting you to look at the code changes (unless you want to), but please could you download and review the USDM_2.11_Test_Data_Template.xlsx file. Please then start your review of this change request and either approve it (if everything looks OK) or post comment(s) with your findings. Thanks!

BSnoeijerCD commented 3 months ago

@ASL-rmarshall : some comments

and questions:

I will check the content of sheets as well today.

ASL-rmarshall commented 3 months ago

@BSnoeijerCD : some replies

@ASL-rmarshall : some comments

  • sheets added for boolean, float, null, string

Yes, this is deliberate. It is to mimic the behaviour of the USDM data service's conversion of JSON files, which creates separate datasets each showing the values for a primitive data type (boolean, float, string) or attributes with null as a value. See the Excel files of the USDM 2.11 example files that I posted in SharePoint.

  • no label for BiomedicalConceptSurrogate class

This is because there is no "CT Item Preferred Name" specified for the BiomedicalConceptSurrogate entity in the "DDF Entities&Attributes" sheet of the USDM 2.11 USDM_CT.xlsx file (cell G194).

  • study class will not have a parent. So corresponding items in test data sheet are not applicable.

For API data, the Study class has the (API-only) Wrapper class as its parent. Again, see the Excel representations of the USDM 2.11 example files that I posted in SharePoint.

and questions:

  • How should the boolean be filled in the example datasets? Y/N, True/False, 0/1 ?

Either as TRUE/FALSE (the Excel boolean values) or true/false as text values: image Note that, by default, the first test data row in every sheet (i.e., the first row under the yellow header rows) is formatted to accept text values in every cell (the cells are formatted as "Text" instead of "General"). If you have multiple rows of test data to enter, you can use the Format Painter functionality to copy this text formatting onto additional rows. This will also prevent ISO8601 date values being converted automatically to Excel date values.

  • Full name of class StudyProtocolDocumentVersion too long. Missing the last n. Should we refer in Core to StudyProtocolDocumentVersio (without the n)?

This is a known issue. Our eventual intention is to be able to remove the ".xpt" suffix from the sheet names (see cdisc-org/cdisc-rules-engine#684), at which point there will be enough space for the full class name. Until then, if we need to reference the class name as a domain within the check specification - e.g., in a distinct operation - we would have to specify it without the "n" (and in upper case). It currently doesn't matter what you put in Scope because it's ignored anyway, so I would recommend using the full class name in Scope.

I will check the content of sheets as well today.

BSnoeijerCD commented 3 months ago

@ASL-rmarshall based on the content, I have the following comments:

ASL-rmarshall commented 3 months ago

@BSnoeijerCD Thanks. I have made some updates. Please download and check the updated Excel file.

@ASL-rmarshall based on the content, I have the following comments:

  • BiomedicalConceptProperty, ScheduledDecisionInstance and AgentAdministration: something is going wrong in column J-M. Headers missing / content split.

I'm not sure what this means - is it just that there appears to be a gap between the header rows for some of the columns when the sheet contains another column with a description (or other characteristic) that wraps onto multiple lines within its cell?

  • Condition: Context and AppliesTo multiplied for each potential target (2x Context, 5x AppliesTo) but not distinguishable for Core I think. Also naming should be appliesToIds and contextIds

Corrected - multiple occurrences of the same relationship name are now collapsed into a single column with the cardinality (row 4) shown a " / " delimited list of all the references. Column names corrected to be appliesToIds and contextIds.

Column names corrected to be <singular noun>Ids for all the following:

  • StudyCell.elements should be studyCell.elementIds (not sure if I missed others that cross-reference and are plural)
  • StudyArm.populations should be studyArm.populationIds
  • studyElement.studyInterventions should be studyElement.studyInterventionIds
  • activity.biomedicalConcepts should be activity.biomedicalConceptIds
  • activity.bcCategories should be activity.bcCategoryIds
  • activity.bcSurrogates should be activity.bcSurrogateIds
  • BiomedicalConceptCategory.children and BiomedicalConceptCategory.members should be childIds and MemberIds
  • ScheduledActivityInstance.activities should be activityIds

These two have also been corrected to show previousId and nextId to match the API spec, but according to the UML, they should be previousIds and nextIds - I've sent you an email about the inconsistency.

  • NarrativeContent.previous should be previousId
  • NarrativeContent.next should be nextId
BSnoeijerCD commented 3 months ago

@ASL-rmarshall : checked. All ok now.