Closed imtvwy closed 2 years ago
I agree. Are we going to use the name of function as the .py file name?
Good idea. Splitting functions and related across different .py files will help avoid merge conflicts and keep structure clear. I would also agree that using name of function for .py file name makes sense.
I am not very sure about adding dataFrame.. what is this dataFrame going to be in original function ? is it going to be the older version of getData, as the data itself is being updated constantly
but I see, after splitting we need a way to get the dataframe.. so we can extract it from data/ dataframe I am guessing ? Also I think we need to add reference to the dataframe itself as it is visible in our repo
Or may be I misunderstanding. The suggestion is to split our bcccovideda.py to four functions accordingly - plotlinebyDate, getdata, showSummary, and histogram, and then use what dataframe if not getData for this files ?
Good, so we have agreed of splitting the functions into different .py and test files. To align with PEP style guide, I'd suggest to change the function name as well.
Here are the suggested functions' name (source file and test file should follow this convention as well)
I am not very sure about adding dataFrame.. what is this dataFrame going to be in original function ? is it going to be the older version of getData, as the data itself is being updated constantly
@liannah , the suggestion of adding a param of dataframe to the latter 3 functions is to make these functions independent from get_data(). In this way, we can pass a dataframe (not necessarily downloaded from the specific dataset, but surely have to follow the specific format (columns)) to the functions and still able to do what the functions supposed to do.
While designing our test cases, we can then make up some dummy dataframes in the helper functions and pass it to our functions for unit test. Otherwise, I can't think of any meaningful test cases to do if we are bound to use the real data. Does it make sense?
For example, we can use the following code for test case: df <- data.frame(Reported_Date=c("2022-01-01", "2022-01-01") ,HA=c("Vancouver Coastal", "Vancouver Coastal"), Sex=c("F", "M"), Age_Group=c("50-59", "50-59"), Classification_Report=c("Lab-diagnosed","Lab-diagnosed")) show_summary_stat(df, "2022-01-01", "2022-01-01")
@imtvwy I understand that part, however I can not see its application within our package. Should not our package show the plot specifically for one dataframe and not any dataframe with specific columns ?
I'd suggest the followings: