datamgmt / Music-Chart-Downloader

Download Music Chart Data
0 stars 0 forks source link
beautifulsoup csv json music python

Music Chart Downloader

Music Chart Downloader (mcd.py) by David M Walker

(c) 2024 Data Management & Warehousing

Extract chart data from chart websites and store in files

Currently supports OfficialCharts.com Singles & Albums charts

Overview

Process

Running the script

Download all UK Singles charts and create a single CSV file with all the data

./mcd.py

Download the 2023 UK Album charts and write one JSON file per week

./mcd.py --chart uk-albums --startdate 20230101 --enddate 20231231 --output_type json --output_set weekly

Help and options

usage: mcdc.py [-h] [--chart {uk-singles,uk-albums}] [--startdate STARTDATE] [--enddate ENDDATE] [--datadir DATADIR]
               [--output_type [{csv,json} ...]] [--output_set [{weekly,all} ...]]

The Music Chart Data Collector

optional arguments:
  -h, --help            show this help message and exit
  --chart {uk-singles,uk-albums}
                        Which music chart to download, (Default: uk-singles)
  --startdate STARTDATE
                        The first chart to download in YYYYMMDD format (Default: 19521114)
  --enddate ENDDATE     The last chart to download in YYYYMMDD format (Default: 20240509)
  --datadir DATADIR     Location of datafiles used in processing (Default: ./data)
  --output_type [{csv,json} ...]
                        Output file formats required (Default: ['csv'])
  --output_set [{weekly,all} ...]
                        Weekly charts and/or one large file (Default: ['all'])

(c)2024 Data Management & Warehousing

Directory Structure

.
├── README.md                   # This file
├── data                        # Data Firectories
│   ├── csv                     # CSV Output files
│   ├── html                    # Downloaded HTML Files
│   └── json                    # JSON Output files
├── mcdc.py                     # The script
└── samples                     # sample html files to understand the structure 
    ├── uk-singles-chart.html
    └── uk-singles-chart.png

Available data field

CSV Format example

chart_date,chart_position,chart_artist,chart_title,chart_movement,chart_peak,chart_weeks
19521114,1,AL MARTINO,HERE IN MY HEART,New,1,1
19521114,2,JO STAFFORD,YOU BELONG TO ME,New,2,1
19521114,3,NAT 'KING' COLE,SOMEWHERE ALONG THE WAY,New,3,1
19521114,4,BING CROSBY,THE ISLE OF INNISFREE,New,4,1
19521114,5,GUY MITCHELL,FEET UP (PAT HIM ON THE PO-PO),New,5,1
19521114,6,ROSEMARY CLOONEY,HALF AS MUCH,New,6,1

JSON Format example

[
    {
        "chart_date": "20221230",
        "chart_movement": "3",
        "chart_position": "1",
        "chart_artist": "MICHAEL BUBLE",
        "chart_title": "CHRISTMAS",
        "chart_peak": "1",
        "chart_weeks": "105"
    },
    {
        "chart_date": "20221230",
        "chart_movement": "1",
        "chart_position": "2",
        "chart_artist": "TAYLOR SWIFT",
        "chart_title": "MIDNIGHTS",
        "chart_peak": "1",
        "chart_weeks": "10"
    }
]

Performance & capacity

To Do List

Potential enhancements for an indeterminate future update

Build environment & maintenance

Other code that has similar functionality

Acknowldgements