cdisc-org / cdisc-rules-engine

Open source offering of the cdisc rules engine
MIT License
45 stars 12 forks source link

Improved algorithm for "labels in title case" (SDTM-CG0359, SEND29) #696

Open JozefAerts opened 2 months ago

JozefAerts commented 2 months ago

Rules CG0359 and SEND29 both describe that labels should be "title case". At the moment, the implementation of CG0359 uses a regular expression, using a list of words that may start with a lowercase character. This is pretty tricky, and probably leads to considerable overreporting. As Tatiana mentioned, there seem some Python packages that do better, including some where one can add a list of acronyms from an external file. I think it would be a good idea to have a look into these packages (I can't do it, I am a Python novice), in order to improve the implementation of these rules, thus minimizing overreporting.

Also see Jira https://jira.cdisc.org/browse/CORERULES-208