INFO523-S24 / project-01-Stats-N-Facts

https://info523-s24.github.io/project-01-Stats-N-Facts/
0 stars 0 forks source link

Proposal peer review #2

Closed zandi-omid closed 8 months ago

zandi-omid commented 8 months ago

The following is the peer review of the project proposal by [name of team completing peer review]. The team members who participated in this review are

The goal is to investigate the evolution of different vaccines for different diseases and learn how to manipulate large datasets.

The dataset consists of 1988 records and 28 features, providing a comprehensive overview of various pharmaceutical products and medicines. It encompasses diverse information, including the medicine’s category, name, therapeutic area, common name, active substance, and unique product number.

The methods include slicing and filtering variables, arranging them in a specific order, and counting the rows.

They would better off showing a glimpse of the dataset, which will give an overview of how the data behaves. For example, we can not realize the categories in the categorical columns or the order of numbers in the numerical variables.

First of all, we should see the dataframe to get in touch with the nature of the dataset. Also, it seems the whole dataset is categorical dataset including the ones that have been mentioned as numerical. It is suggested to compute some summary statistics by including some ratios or percentages.

How long does it take for a vaccine to get approved on average? How are the scientists doing with improving the vaccines for different diseases over time?

They don't have to show the code. Also, they have to show the data frame along with the variables description. In fact, we had to look over the variables in the dataset github ourselves.

It seems like the dataset contains only categorical variables. Even the logical variables are also categorical, which makes the analysis difficult.

hmfattah commented 8 months ago

Thank you for your helpful comments. We agree that providing a glimpse of the dataset (i.e., print(dataset.head())) would have been a better idea for reviewers. We will also include some summary statistics.

Although most features are categorical, there are numerical variables as well, such as 'revision_number'. In fact, we will use that feature to answer our first question.