aim-rsf / browseMetadata

An R package to help a researcher browse metadata for health datasets and categorise variables based on research domains
https://aim-rsf.github.io/browseMetadata/
GNU General Public License v3.0
3 stars 1 forks source link

add_lookup_file #77

Closed RayStick closed 4 months ago

RayStick commented 4 months ago

Closes #66 Closes #68

New data has been added to the package which is a lookup table for auto-categorizations

These auto categorisations were previously described within the domain_mapping function itself, so this chunk of code has now been deleted.

The README file has been updated to reflect that there is now a 3rd input to the domain_mapping function


@Rainiefantasy in your review could you please:

Thanks! =D

Rainiefantasy commented 4 months ago

This looks great! will do a demo run monday! 🤞

Rainiefantasy commented 4 months ago

Notes:

  1. demo run worked great!
  2. was able to load my own csv look up file, and that worked too :). Only thing to note is, when there's duplicate rows in the look up table, it seems to stop functioning - intention here was to try categorise for more than one domain and wasn't sure how to do that. But maybe that's fine if it'll mostly be 1:1 mappings?
RayStick commented 4 months ago
  1. demo run worked great!

Yay!

was able to load my own csv look up file, and that worked too :).

Yay!

Only thing to note is, when there's duplicate rows in the look up table, it seems to stop functioning - intention here was to try categorise for more than one domain and wasn't sure how to do that. But maybe that's fine if it'll mostly be 1:1 mappings?

Ah, that's a good point. Just so I understand - you mean if you duplicate the DataElement row and give it a different mapping, it will only read the first row for this Data Element? Most of the Data Elements included in the lookup will be 1:1 mappings as you say (unlike the ones the user may categorise) but it could be good to allow for multiple mappings. Do you think we should try to address that in this PR or write something in the README for now like "The lookup table assumes 1:1 mappings" and make a new GH Issue for it to address later (as an enhancement).

Rainiefantasy commented 4 months ago

Ah, that's a good point. Just so I understand - you mean if you duplicate the DataElement row and give it a different mapping, it will only read the first row for this Data Element? Most of the Data Elements included in the lookup will be 1:1 mappings as you say (unlike the ones the user may categorise) but it could be good to allow for multiple mappings. Do you think we should try to address that in this PR or write something in the README for now like "The lookup table assumes 1:1 mappings" and make a new GH Issue for it to address later (as an enhancement).

Yes correct, duplicating the row so that I have for example:

AVAIL_FROM_DT,Health,8
AVAIL_FROM_DT, Unsure,0

Instead of auto classifying the variable for both domains (8,0) it asks you to classify it manually, and at the end still asks you to check the autocategorisations but looks like a bug:

! Please check the auto categorised data elements are accurate:
[1] DataClass   DataElement Domain_code
<0 rows> (or 0-length row.names)

If it'll be a one to one mapping anyway I think it's fine for now; but yep as you say could be an enhancement for later if you think it'd help?

RayStick commented 4 months ago

@Rainiefantasy that makes to me, we'll leave it for later then, as the point of the lookup is mostly 1:1 mappings. If you are happy please approve this PR

Rainiefantasy commented 4 months ago

Source code looks great to me, happy with merge :)