aim-rsf / browseMetadata

An R package to help a researcher browse metadata for health datasets and categorise variables based on research domains
https://aim-rsf.github.io/browseMetadata/
GNU General Public License v3.0
3 stars 1 forks source link

Improve manual checking of categorisations #67

Closed RayStick closed 5 months ago

RayStick commented 6 months ago

Summary of changes

This PR improves the way the code deals with auto categorisations of variables, and how the user reviews all categorisations (manual or auto)

Closes #70

Summary of changes (user interaction)

Summary of code refactoring

Summary of documentation


@Rainiefantasy could you focus on what the code is doing, and do some user testing to check it does not break?

@BatoolMM could you focus on more of the 'R package stuff' and documentation to ensure the checks are still passing etc.

Rainiefantasy commented 5 months ago

Some initial feedback: It looks really great - I've followed the demo and it's worked so that's a yay :) Some things I noticed when going through the demo (sort of grouped the feedback in bold, enhancements are extras)

- 1. [**potential bug?**] I didn't see the prompt mentioned _'If you make a mistake, the next prompt allows you to redo. Or press enter if you are happy to continue.'_ Do you see it or did you remove this functionality and perhaps the readme needs updating?
- 2. [potential bug?] When you finish processing the variables, again there's an expected prompt to go back and redo the categorisations: _'Would you like to review your categorisations? (Y/N)']_ which I didn't see
+ 3. [structure] It may be useful to have the context of the domains, output, etc towards the beginning of the tutorial, OR, a footnote to indicate this information can be found at the bottom of the readme. From someone who doesn't know much about the repo/SAIL/domain data, it made more sense to have a bit of context before trying to classify the features, but maybe that's just me! :) 
- 4. [enhancement] When looking at a new data class/table it'd be useful to indicate which table you're looking at now, and perhaps it'd help you decide how many variables to process for the table?
- 5. [enhancement] It may help to have a pop up to indicate when a variable is being auto classified and is being skipped?
+ 6. printing the variables for user to see before choosing the range of variables to categorise 
+ 7. [enhancement] to prevent user from typing in whatever, there could be errors raised if user isn't for example, entering a comma separated integer type input (I tried all sorts and it doesn't seem to throw an error), or when intending to write 'redo' but misspells, it's considered as a skip/enter.  but this is a bit extra perhaps especially since the output is for the users own use and isn't going to break anything downstream

Hope that helps for now! :) will tag if I think of anything else, and let me know if there's anything more specific you had in mind in terms of feedback

Edit (5/03/2024 11:30): I've changed the above according to the new changes I pulled using your guide yesterday!

RayStick commented 5 months ago

@Rainiefantasy thanks very much for this user testing. I just realized from your feedback that I think you ran the code in the main branch but not the latest updates from the improve-auto branch? Can I ask you to double check that you are running the code in improve-auto and then see which of your points above remain relevant? Thanks! I will send you some direct tips on getting that running

Rainiefantasy commented 5 months ago

Thanks Rachael - I've made changes to the above comment now after pulling the latest version :) wasn't sure the best way to indicate which is relevant now so I've just git diff and marked green as relevant, red as irrelevant :). I.e. only points 3,6,7 are relevant

RayStick commented 5 months ago

Super helpful, thanks @Rainiefantasy

  1. [structure] It may be useful to have the context of the domains, output, etc towards the beginning of the tutorial, OR, a footnote to indicate this information can be found at the bottom of the readme. From someone who doesn't know much about the repo/SAIL/domain data, it made more sense to have a bit of context before trying to classify the features, but maybe that's just me! :)

I see what you mean. Can you write this in a GitHub Issue and mark it with 'enhancement' and 'question' labels. See if it relates to another existing issue, and add it on, but I think it might not. Then we can think about how best to approach, taking in account the type of users that are likely to use this package.

  1. printing the variables for user to see before choosing the range of variables to categorise

I like that feature. Can you write this in a GitHub Issue and mark it with 'enhancement'? See if it relates to another existing issue, and add it on, but I think it might not.

  1. [enhancement] to prevent user from typing in whatever, there could be errors raised if user isn't for example, entering a comma separated integer type input (I tried all sorts and it doesn't seem to throw an error), or when intending to write 'redo' but misspells, it's considered as a skip/enter. but this is a bit extra perhaps especially since the output is for the users own use and isn't going to break anything downstream

I'd really like to sort this but wasn't sure how to code it yet. Is this captured in this existing issue: https://github.com/aim-rsf/browseMetadata/issues/71? If so, please comment on the issue/edit the issue text with any other thoughts. Would welcome help on solving it!

If you are happy with those responses, please approve the PR :) Happy to sort out some of them beforehand if you think we should do that before merging these changes into main.

RayStick commented 5 months ago

@BatoolMM if you have capacity, could you review this PR but focus on more of the 'R package stuff' and documentation, ensuring that all the checks are still passing etc.

@Rainiefantasy reviewed the PR, focusing on what the code was doing, and did some user testing to check it did not break etc. I am open to more user testing, but I think that can also be after this PR is merged if people agree. And we have captured improvements that are needed in GitHub issues to work on, see this milestone: https://github.com/aim-rsf/browseMetadata/milestone/2

BatoolMM commented 5 months ago

This is fantastic; the package has become more user-friendly and intuitive. The ability to edit the last data element and review the at the end is wonderful.

I did some checks on the documentation, functionality, and dependencies, and everything appears to be working well. However, I encountered a minor warning concerning the newly introduced "Note" column. The method used to define this variable seems somewhat different from the others, resulting in this message:

domain_mapping: no visible binding for global variable ‘Note’
  Undefined global functions or variables:
    Note

I am not sure if it hasn't been explicitly defined in the global environment or within the function's scope - I need to spend more time on the code.

Also, it seems the "Note" column doesn’t come up in the final review table, although it does appear in the CSV file. Or this could have been only me.

When I added at the top:

utils::globalVariables(c("Note"))

the warning stopped - but I believe no immediate changes are necessary, and the update is ready to be merged if you are comfortable with that.

RayStick commented 5 months ago

Thanks for these comments @BatoolMM. They are super useful. I will check about the 'Note' variable before I merge, in case I did something silly and forgot to include it somewhere I was meant to.

RayStick commented 5 months ago

@BatoolMM I am going to add this to a new Issue to be investigated later. There doesn't seem to be anything obvious to me, but I do see this same package warning as well. Everything seems to be functioning as intended, but I would like to understand the warning better.

Also, it seems the "Note" column doesn’t come up in the final review table, although it does appear in the CSV file. Or this could have been only me.

This part is intentional - I was trying to not print out too much text to the console. We could show the note (during the variable review) if that was deemed helpful though.

Based on this, I will merge as long as long as @Rainiefantasy is happy with my above responses!

Rainiefantasy commented 5 months ago

Happy for you to merge and I can create new issues for the suggestions above! 😸

BatoolMM commented 5 months ago

@RayStick Please do feel free to merge 🚀 🚀 🚀

Rainiefantasy commented 5 months ago

FYI I’ve created issues for comments above:

I assigned us both for all three for now as you mentioned you'd like help, but we can decide how to distribute later :)

RayStick commented 5 months ago

@BatoolMM and @Rainiefantasy - thank you both!

RayStick commented 5 months ago

@all-contributors add @Rainiefantasy for pull request review and ideas

allcontributors[bot] commented 5 months ago

@RayStick

I've put up a pull request to add @Rainiefantasy! :tada: