atorus-research / xportr

Tools to build CDISC compliant data sets and check for CDISC compliance.
https://atorus-research.github.io/xportr/
Other
41 stars 8 forks source link

Split data checks based on agency #201

Open elimillera opened 8 months ago

elimillera commented 8 months ago

Feature Idea

This was brought up in the Dec122023 meeting. There are different rules for different agencies. For example, FDA doesn't allow underscores or non-ascii in filenames. We could add a flag to strict_checks in write_xpt to check agency specific rules.

@cpiraux Feel free to add in anything I missed or misstated.

Relevant Input

No response

Relevant Output

No response

Reproducible Example/Pseudo Code

No response

cpiraux commented 8 months ago

I am adding an example for more clarification.

The XPT requirements and those from regulatory agencies can differ. For instance, let's examine the distinct requirements for dataset and variable labels:

XPT FDA NMPA
No restriction on characters; maximum length is 40 bytes. Variable names, as well as variable and dataset labels, should include American Standard Code for Information Interchange (ASCII) text codes only. Maximum Length in Characters = 40 For eSubmission in China, one of the requirements is to translate the foreign language data package (e.g., English) to Chinese. Variable labels, dataset labels, MedDRA, WHO Drug terms, primary endpoint-related code lists, etc., need to be translated from English to Chinese.

Currently, in df_label.R, the function fails if the label does not meet the following requirements:

label_len <- nchar(label)

if (label_len > 40) {
  abort("Length of dataset label must be 40 characters or less.")
}

if (stringr::str_detect(label, "[^[:ascii:]]")) {
  abort("`label` cannot contain any non-ASCII, symbol, or special characters.")
}

The first check represents an XPT requirement, while the second one aligns with FDA specifications. I suggest moving agency-specific checks to xpt_validate so that they can be ignored if necessary.