Adeeshaj / carvestor-analyzer

MIT License
0 stars 0 forks source link

Analyzer - Data Exploration #3

Closed Adeeshaj closed 11 months ago

Adeeshaj commented 1 year ago

Explore the dataset to understand its characteristics, such as data types, distributions, and potential outliers. Create summary statistics and visualizations to gain insights into the data.

Adeeshaj commented 1 year ago

columns meta data

2450 entries, 0 to 2456 Data columns (total 18 columns): count Column Non-Null Count Dtype
0 id 2450 non-null int64
1 listing_url 2450 non-null object
2 title 2450 non-null object
3 location 2450 non-null object
4 price 2450 non-null float64
5 price_currency 2450 non-null object
6 listing_date 2450 non-null object
7 description 2450 non-null object
8 Brand: 2450 non-null object
9 Model: 2450 non-null object
10 Mileage: 2450 non-null object
11 Body type: 2238 non-null object
12 Condition: 2450 non-null object
13 Fuel type: 2450 non-null object
14 Transmission: 2450 non-null object
15 Engine capacity: 2450 non-null object
16 Year of Manufacture: 2450 non-null object
17 Trim / Edition: 1970 non-null object

other than Trim /Edition other fields have 100% data. Trim / Edition have 80% data all data good to analyze

Adeeshaj commented 1 year ago

Price Analysis

Since the main target is price analysing. Here we only do on price column

Histogram: To visualize the distribution of a single numerical variable, you can use a histogram. Image

Kernel Density Estimate (KDE) Plot: A KDE plot estimates the probability density function of a continuous variable. Image

Box Plot: To visualize the distribution of a numerical variable or compare distributions between different categories. Image

Violin Plot: Similar to a box plot, but also shows the probability density of the variable at different values. Image

Adeeshaj commented 1 year ago

Removing Outliers

Looking at the boxplot chart and the violin plot charts Here clearly there are potential outliers.

Image Image

removing outliers charts are looks more meaningful here

Adeeshaj commented 1 year ago

Analysing categorical fields

Brand count 2450 unique 47 top Toyota freq 673 Name: Brand: , dtype: object

Model count 2450 unique 314 top Alto freq 134 Name: Model: , dtype: object

Mileage count 2450 unique 729 top 100,000 km freq 49 Name: Mileage: , dtype: object

Body type count 2238 unique 7 top Hatchback freq 786 Name: Body type: , dtype: object

Condition count 2450 unique 3 top Used freq 2404

Fuel type count 2450 unique 6 top Petrol freq 1688

Transmission count 2450 unique 3 top Automatic freq 1519 Name: Transmission: , dtype: object

Engine capacity count 2450 unique 184 top 1,500 cc freq 494 Name: Engine capacity: , dtype: object

Year of Manufacture count 2450 unique 59 top 2015 freq 235 Name: Year of Manufacture: , dtype: object

Trim / Edition count 1970 unique 1253 top Toyota freq 38 Name: Trim / Edition: , dtype: object

Adeeshaj commented 1 year ago

Analysing categorical fields - Grouping and Aggregation (mean - price)

Image Image Image Image Image

Adeeshaj commented 1 year ago

Analysing categorical fields - Pie Chart

Image Image Image Image Image Image Image