aim-rsf / browseMetadata

An R package to help a researcher browse metadata for health datasets and categorise variables based on research domains
https://aim-rsf.github.io/browseMetadata/
GNU General Public License v3.0
3 stars 1 forks source link

Allowing for one to many mappings (variables to domains) via the lookup #78

Closed Rainiefantasy closed 2 months ago

Rainiefantasy commented 4 months ago

Currently the domain_mapping() function only allows 1 to 1 mapping, when reading what to auto categorise based on the lookup file. If attempting to map a variable to multiple domains using the lookup file, the function behaves buggy. For example, tested with following rows in look up file:

AVAIL_FROM_DT,Health,8
AVAIL_FROM_DT, Unsure,0

which resulted in function asking to manually classify the AVAIL_FROM_DT variable (and this is what shows up in the log). Both rows in the lookup file are essentially ignored, but function still assumes the variable as being auto-classified with the sanity check at the end:

! Please check the auto categorised data elements are accurate:
[1] DataClass   DataElement Domain_code
<0 rows> (or 0-length row.names)

Either the functionality can be removed all together and added in readme docs, or can have functionality to allow mapping of variable to multiple domains.

Quick fix for now

The README file mentions that the lookup file can only deal with 1:1 mappings (i.e. do not list the same DataElement on more than one row in the lookup file). This is fine for now - as the purpose of the lookup file is to categorise variables into pre-defined domains that do not overlap ([0] NO MATCH / UNSURE, [1] METADATA, [2] ALF ID, [3] OTHER ID, [4] DEMOGRAPHICS). If the user provided their own lookup file, this is where the bug could become relevant. One to many mappings are still possible if DataElements are not included in the lookup file

RayStick commented 2 months ago

I think the quick fix (explained above) is okay so I am closing this issue. Thanks for raising it @Rainiefantasy, as it led me to changing the README file to make it clear how it should be made. If you think not, please let me know and I will re-open :D