WhyAxis / youth-happiness-analysis

0 stars 0 forks source link

Citing Extisting Kernels. #1

Closed rads284 closed 6 years ago

rads284 commented 6 years ago

Analyse one of the kernels. @rads284 https://www.kaggle.com/jkokatjuhha/we-are-from-our-childhood @shreya5998 https://www.kaggle.com/ankur310794/network-analysis-of-hobbies-interests @Yaminiag https://www.kaggle.com/miroslavsabo/analyzing-gender-differences Please choose any one other kernel too.

rads284 commented 6 years ago

https://www.kaggle.com/jkokatjuhha/we-are-from-our-childhood I would address the following questions for the kernel I studied. a) assumptions made, if any (b) approach used - a summary (c) summary of the results reported (d) any limitations reported? (e) any lacuna in their approach/ evaluation that you inferred?

The above kernel focuses on analysis and prediction of data based on the origin of a person(i.e. Village or City).The author focuses on difference in the distribution across gender at the same time. The author analyses the data for misssing values,outliers and wrong/malicious inputs really well.She also takes care of the uneven distribution(i.e. less no of entries for village) and upsamples data accordingly. She sights a very interesting case of missing value where a few females,males who have omitted filling in their weight have also omitted 18 other fields with regard to height,weight and age identification. She uses correlations to prove / disprove her hypothesis.She uses Logistic Regression to help her get more insights with the respective analysis.

rads284 commented 6 years ago

@rads284 https://www.kaggle.com/kalvii03/alcoholics-more-worried-about-health-not-really @Yaminiag https://www.kaggle.com/mikesch/who-are-the-money-savers @shreya5998 https://www.kaggle.com/jlemains/young-people-clustering

Yaminiag commented 6 years ago

https://www.kaggle.com/miroslavsabo/analyzing-gender-differences

The kernel focuses on analysis of movie and music preferences, interests and phobias based on gender. The author analyses the differences using mean as a parameter. Here, mean is an appropriate measure of central tendency as the data values are in a particular range .He visualises the four groups using a scatter plot keeping the average response 3 as point of reference.He concludes that women prefer Latino and Musical genres and are interested in art and celebrities more than men . He also reports that there is no phobia where men fear more than women.

shreya5998 commented 6 years ago

https://www.kaggle.com/ankur310794/network-analysis-of-hobbies-interests Network Analysis of Young People Hobbies & Interests This kernel analyses the correlation between the various hobbies and interests of the young people in the form of networks (graphs) using various network analysis techniques. The author has considered 32 factors consisting of hobbies and interests of the people and has analysed their degree of correlation by using both unregularized and regularized(LASSO Regularization) methods. He states few observations like people interested in politics are also interested in law and history,etc. The author has analysed these factors using polychoric correlation (Polychoric correlation measures agreement between multiple raters for ordinal variables). He has represented his observations in the form of graphs where nodes are the attributes and the edges (i.e thickness of the edges) represent the extent of correlation between the corresponding two nodes/attributes. He has done Centrality Analysis to determine which of the considered attributes have more weightage or importance than the others. He states an important observation that “Physics” attribute is well connected to majority of the other attributes. He uses clustering (spinglass algorithm) to group similar attributes together as communities. The observations throw light on the extent up to which various hobbies and interests affect or influence each other and how similar some of the features are.

shreya5998 commented 6 years ago

https://www.kaggle.com/boltmaud/musics-depending-on-demographic-data This kernel tries to analyse and predict the music preferences based on the demographic data. The author considers 17 columns in the dataset which has music-related attributes and she finds various statistical measures of these. She states that there are few missing values in some of the columns due to the difference in count. She takes care of the missing values by replacing them with 0s. She uses One Hot Encoding to transform the categorical data into binary values. The author uses Apriori algorithm to predict the music choices of the subjects based on demographic data.

rads284 commented 6 years ago

https://www.kaggle.com/kalvii03/alcoholics-more-worried-about-health-not-really This kernel tries to analyse the difference between alcoholics and non-alcoholics and which one of these sectors are more concerned about their health.The author uses a very basic approach of plotting bar plost cosidering various factors in hand.The author concludes that regardless of the drinking habits the average population is moderately concerned about their health.The one thing the author missed out on is sampling bias,the author's sample had a very high proportion of drinker whereas very few non-drinkers.The author irrespective of the bias drew conclusions without upsampling/downsampling.

rads284 commented 6 years ago

https://www.kaggle.com/jlemains/young-people-clustering The author drops all rows with na values irrespective of the requirement.This kernel just uses Kmeans algorithm to cluster based on different variables,shows the difference in results for normalized and non-normalized data.We notice that Error for normalized data is much smaller,hence its a better approach. No clear or useful conclusions can be derived form this kernel.

shreya5998 commented 6 years ago

https://www.kaggle.com/gieun34/daily-accumulation-of-ourselves Daily accumulation of ourselves -The author first analyses the dataset for missing columns and replaces them with NAs -He filters outliers from few columns. -Even though survey was conducted at university level, the presence of many secondary school students indicates that those people may not be from the university. -The author is interested in analysing the finance and healthy eating habits of the subjects. -He plots various behaviours and traits. -He uses many graphs in order to better understand and analyse the data. -He analyses various traits and behaviours with repect to healthy eating and saving. -He states an observation that having a healthy lifestyle is highly correlated to having self-control. -He performs t-tests and logistic regression in order to arrive at better conclusions. Following are the important observations: ->Regretting past decisions is prevalent in youth irrespective of any other feature. ->The same holds true in case of self-criticism too. ->Prioritising workload is higher in groups with more saving and healthy lifestyle. ->Money-saving is influenced by self-regulation which in turn is influenced by healthy lifestyle. Therefore we can conclude that healthy lifestyle does affect money-saving which certainly has a positive influence on the overall well-being of an individual.

Yaminiag commented 6 years ago

https://www.kaggle.com/mikesch/who-are-the-money-savers The above kernel focuses on the features which young people who save money and those who don't save money have. The author drops the missing values from the columns which are of string type and Finances and replaces the remaining missing values with 0. He plots bar graphs for the interests columns and the string columns.He then analyses the spending habits by plotting the mean values for various categories. He observes that there is over-confidence in financial discipline. He avoids multi level prediction and so splits finances into two groups. The author uses logistic regression model to predict finances and achieves an accuracy of 0.64 on the test data.He then plots importance of ten negative and positive features. He concludes that staying in rural areas have positive impact while staying in cities have negative impact on spending.According to him, from the plots we can assume that spending habits and intelligence have a connection.