UBC-MDS / bccovideda

Package to generate the summary statistics as well as plots for the COVID19 cases in British Columbia.
MIT License
1 stars 1 forks source link

Grouping of functions and test cases #33

Closed imtvwy closed 2 years ago

imtvwy commented 2 years ago

I'd suggest the followings:

  1. Split our functions into 4 different .py files for easy version control
  2. Have 4 different test files correspondingly
  3. For the showStat and plots functions, suggest to add one more param with type dataframe so that we can get independent from the getData function. Inspired by test cases writing where we may want to create simple test dataframe object to test these functions. Example of usage: df = getData(url, folder) showSummaryStat(df, startDate, endDate)
johnwslee commented 2 years ago

I agree. Are we going to use the name of function as the .py file name?

vtaskaev1 commented 2 years ago

Good idea. Splitting functions and related across different .py files will help avoid merge conflicts and keep structure clear. I would also agree that using name of function for .py file name makes sense.

liannah commented 2 years ago

I am not very sure about adding dataFrame.. what is this dataFrame going to be in original function ? is it going to be the older version of getData, as the data itself is being updated constantly

liannah commented 2 years ago

but I see, after splitting we need a way to get the dataframe.. so we can extract it from data/ dataframe I am guessing ? Also I think we need to add reference to the dataframe itself as it is visible in our repo

liannah commented 2 years ago

Or may be I misunderstanding. The suggestion is to split our bcccovideda.py to four functions accordingly - plotlinebyDate, getdata, showSummary, and histogram, and then use what dataframe if not getData for this files ?

imtvwy commented 2 years ago

Good, so we have agreed of splitting the functions into different .py and test files. To align with PEP style guide, I'd suggest to change the function name as well.
Here are the suggested functions' name (source file and test file should follow this convention as well)

imtvwy commented 2 years ago

I am not very sure about adding dataFrame.. what is this dataFrame going to be in original function ? is it going to be the older version of getData, as the data itself is being updated constantly

@liannah , the suggestion of adding a param of dataframe to the latter 3 functions is to make these functions independent from get_data(). In this way, we can pass a dataframe (not necessarily downloaded from the specific dataset, but surely have to follow the specific format (columns)) to the functions and still able to do what the functions supposed to do.

While designing our test cases, we can then make up some dummy dataframes in the helper functions and pass it to our functions for unit test. Otherwise, I can't think of any meaningful test cases to do if we are bound to use the real data. Does it make sense?

For example, we can use the following code for test case: df <- data.frame(Reported_Date=c("2022-01-01", "2022-01-01") ,HA=c("Vancouver Coastal", "Vancouver Coastal"), Sex=c("F", "M"), Age_Group=c("50-59", "50-59"), Classification_Report=c("Lab-diagnosed","Lab-diagnosed")) show_summary_stat(df, "2022-01-01", "2022-01-01")

liannah commented 2 years ago

@imtvwy I understand that part, however I can not see its application within our package. Should not our package show the plot specifically for one dataframe and not any dataframe with specific columns ?