C10-Brazilian-e-commerce-modeling-team / brazilian-e-commerce

0 stars 6 forks source link

chore: Analytics. Question 5 #30

Closed GabyGO2108 closed 2 years ago

GabyGO2108 commented 2 years ago

Summary 💡

Answer the assigned question including data analysis.

Acceptance Criteria

GabyGO2108 commented 2 years ago

So, as seen in the previous question, we can speculate that the problem of churning comes from the processes of the e-commerce and not from the customers; also, we don't have any customer information, so we had to find a way to involve de consumers in our analysis. For this, we tried watching again the best selling products to give us an idea of who buys here. Let's see that graph once more.

image

So, based in what we see and using our old fashionness, we can say that the number one buyers in this e-commerce are women between 24 and 54 years old, and probably about half of this particular population is married. And well, that's as far as we can go in the speculation department with the given data. But what else can we derive from the data in general that can account for the massive churn?

Well, let's first take a look at the payment methods. Here we can see that most of the customers pay via credit card, so the population that buys here is employed and probably has a steady income.

image

Looking more closely we can notice that the e-commerce has at least three well known payment methods, but this is not conferring any flexibility or perceived as an advantage by consumers, so having these doesn't seem to be adding any value to the company.

Now let's try dissecting the delivery part so we can generate more insights.

image

image

Apparently, the delivery part does not present a problem, and actually most customers score delivery good.

And what about the reviews? Could they tell us more about the churn?

So, first let's see which are the categories with the worst reviews.

image

Looking at the above image, we can see that our number one and three top ten products sold are also the worst reviewed. Another interesting thing is that, safe for a couple of categories, most pass the 2,000 reviews, so it's definitely not good. It is a red flag when more than a thousand people stops to give bad reviews about the products bought. But how bad exactly are these reviews? Taking into consideration that 1 is the worst and 5 is the best, how are the products from the e-commerce scored?

image

Honestly, not so good. Most have the lowest possible score; but the number alone won't tell us with precision what's going on. For that insight we'll need to do web scrapping to find out exactly what people are saying about the products, and hopefully find a way to improve.

larispardo commented 2 years ago

I understand, but my question would be Why do you need web scrapping when we have the "review" dataset and there is a column where it says "review comment"

larispardo commented 2 years ago

Also, you mention there "is no information on users" but there is a "customer" csv so I am not convinced on your answer

GabyGO2108 commented 2 years ago

For the web scarpping we can derive comments (words) that can help us pinpoint in a more specific way the issues the company is having. For example, customers maybe complaining that products arrived damaged, or that the overall quality was bad; so it would be helpful to have those insights.

As for the "customer" csv, if you check it, there is no demographic information of our users. The only columns it has are: customer_id, customer_unique_id, customer_zip_code_prefix, customer_city and customer_state. Even though it's good information, it is not enough. We don't know how many female or male customers the e-commerce has, nor their ages. We also have not the slightest clue about their income, marital status, education, and/or employment, all of which are important questions to address when talking about marketing campaigns and understanding the best selling product data. And these information would also be extremely helpful when answering the "What else the data tells you about these users?" question in a less speculative way.