Closed RayStick closed 3 weeks ago
Package version number has now been included in the csv output logs. This could be important to track which version number created the csv outputs, particularly if the compare_csv_outputs command is being used to compare outputs created from different package versions
Is this the browseMetadata column - I can see v 1.0.1 so that's great πΈ !
When doing this, I realised it made sense to split the csv output log into two files not one - one file is about the session itself (LOG) and the other csv file contains the actually categorisations made by the user (OUTPUT). Previously there was a lot of repetition in the OUTPUT csv file. They can be linked via timestamp column.
That's fair enough - the timestamp is in the filename as well so makes sense can use it as an identifier π Can see both files, looks good!
Because there are now two csv outputs, the input arguments for compare_csv_outputs had to be adjusted See the README which reflects these changes to the code
When trying to run the compare function:
compare_csv_outputs('LOG_NationalCommunityChildHealthDatabase(NCCHD)_BLOOD_TEST_2024-07-05_16-07-38.599493.csv','OUTPUT_NationalCommunityChildHealthDatabase(NCCHD)_BLOOD_TEST_2024-07-05_16-07-38.599493.csv','LOG_NationalCommunityChildHealthDatabase(NCCHD)_BLOOD_TEST_2024-07-08_12-03-30.429336.csv','OUTPUT_NationalCommunityChildHealthDatabase(NCCHD)_BLOOD_TEST_2024-07-08_12-03-30.429336.csv','Documents/GIT/browse-SAIL/data-raw/national_community_child_health_database_(ncchd)_20240405T130125.json')
I get a prompt for a mismatch and to 'βΉ Provide concensus decision for this DataElement:' but when I enter a number (one of the two domain classifications) it errors with:
Error in user_categorisation(selectTable_df$Label[datavar], selectTable_df$Description[datavar], : argument "domain_code_max" is missing, with no default
Not sure if I'm doing something wrong here?
Other than that, I'm not sure why you need all four files as opposed to just having the two classification files as you are comparing the output - was there a reason for this?
See line 143 of domain_mapping which allows the user to add an optional free text note about this table. They always have the option to skip a table and not process it as well
This looks good to me! β¨
From working on compare_csv_outputs I don't see an issue with two output files having the same initials and therefore think this issue can be closed - let me know if you spot anything.
Sounds good! l agree- means the function can be used as a sanity check for a user as well :)
This commit https://github.com/aim-rsf/browseMetadata/commit/0dc8c36eb39748bfe943dc3a8320b1a36c3b7b45 made changes to the user_categorisation to validate and standardize user inputs. Smaller changes were made to domain_mapping as there is now a 4th argument for user_categorisation
Only looked briefly at code here but looks ok at glance π
DESCRIPTION file version number brought up to date Fixed a bug in compare_csv_outputs and validated that all the checks work I seem to still have issues with that function, can double check I've got the latest version
Another minor feedback - I may have suggested it'd be good to have a choice to amend classifications in case of a mistake, but the more I'm using it, I'm wondering if it's necessary after every classification. Personally I think having just one prompt at the end to check your classifications and if you're happy or want to amend, is enough. See what you think though this is just my experience in using it πΈ
Another minor feedback - I may have suggested it'd be good to have a choice to amend classifications in case of a mistake, but the more I'm using it, I'm wondering if it's necessary after every classification. Personally I think having just one prompt at the end to check your classifications and if you're happy or want to amend, is enough. See what you think though this is just my experience in using it πΈ
I wondered if it was necessary as well, but actually both Dan and Nida suggested this feature would be useful. I think when I am in 'test mode' I find the amount of prompts given to me as the user annoying XD but that's because I am doing it quickly. Going through it properly with a real use-case, I can see the extra functionality to correct things can be good, so I'll leave this functionality in for now.
@Rainiefantasy your errors with the compare_csv_outputs.R
function were due to me not updating the call to user_categorisation
function. See line 134 of https://github.com/aim-rsf/browseMetadata/pull/104/commits/9cad2f6a43126662ec250356478b5d25bbdfb2db where I have now added 'max(Code$Code)' as an input argument, and this solved the error you got.
domain_mapping
(lines 83-90)To answer your question 'I'm not sure why you need all four files as opposed to just having the two classification files as you are comparing the output - was there a reason for this?'
LOG_
csv files are read in in order to perform all the validation checks i.e. is it valid for these two csv files to be comparedOUTPUT_
csv files are compared Finally, I changed the function name to compare_sessions
What do you think? Can you run again with same inputs please and see if it no longer errors? Thanks!
I wondered if it was necessary as well, but actually both Dan and Nida suggested this feature would be useful. I think when I am in 'test mode' I find the amount of prompts given to me as the user annoying XD but that's because I am doing it quickly. Going through it properly with a real use-case, I can see the extra functionality to correct things can be good, so I'll leave this functionality in for now.
Just spotted this, that's no problem - it's mostly a preference thing so I guess it doesn't matter. π But I agree, it's probably because of the testing you notice it more!
Thanks for reviewing this long PR @Rainiefantasy - appreciate it!
β Sorry, this is a big PR ... not the best practice ...
Closes #102
compare_csv_outputs
command is being used to compare outputs created from different package versionscompare_csv_outputs
had to be adjustedCloses #91
domain_mapping
which allows the user to add an optional free text note about this table. They always have the option to skip a table and not process it as wellCloses #97
compare_csv_outputs
I don't see an issue with two output files having the same initials and therefore think this issue can be closed - let me know if you spot anything.Closes #82
user_categorisation
to validate and standardize user inputsdomain_mapping
as there is now a 4th argument foruser_categorisation
Other changes:
compare_csv_outputs
and validated that all the checks workChecklist before review: