fabriziosalmi / blacklists

Hourly updated domains blacklist 🚫
https://github.com/fabriziosalmi/blacklists/releases/download/latest/blacklist.txt
GNU General Public License v3.0
117 stars 5 forks source link

Download tar.gz https://get.domainsblacklists.com/blacklist.txt if updated #40

Closed fabriziosalmi closed 12 months ago

fabriziosalmi commented 1 year ago

Certainly! Here's an improved version of the script that:

  1. Shows progress using the tqdm library. This will show a progress bar while downloading the file.
  2. Extracts the content of the downloaded file (assuming it's a tar.gz file) to blacklist.txt.

Firstly, you'll need to install tqdm:

pip install tqdm

Now, here's the improved script:

import requests
import os
from tqdm import tqdm
import tarfile

# Constants
TOKEN = 'YOUR_GITHUB_PERSONAL_ACCESS_TOKEN'
REPO_OWNER = 'owner_of_repository'
REPO_NAME = 'repository_name'
FILE_PATH = 'path_to_file_in_repo.txt'
LAST_CHECKED_TIMESTAMP = 'last_checked_timestamp.txt'
DOWNLOAD_FILE_NAME = 'downloaded_file.tar.gz'
EXTRACTED_FILE_NAME = 'blacklist.txt'
HEADERS = {
    'Authorization': f'token {TOKEN}',
    'Accept': 'application/vnd.github.v3+json'
}

def get_last_checked_timestamp():
    if os.path.exists(LAST_CHECKED_TIMESTAMP):
        with open(LAST_CHECKED_TIMESTAMP, 'r') as f:
            return f.read().strip()
    return None

def set_last_checked_timestamp(timestamp):
    with open(LAST_CHECKED_TIMESTAMP, 'w') as f:
        f.write(timestamp)

def download_with_progressbar(url, filename):
    response = requests.get(url, stream=True)
    total_size = int(response.headers.get('content-length', 0))
    block_size = 1024
    t = tqdm(total=total_size, unit='B', unit_scale=True, desc=filename)
    with open(filename, 'wb') as f:
        for data in response.iter_content(block_size):
            t.update(len(data))
            f.write(data)
    t.close()

def main():
    # Get file details from GitHub
    url = f'https://api.github.com/repos/{REPO_OWNER}/{REPO_NAME}/contents/{FILE_PATH}'
    response = requests.get(url, headers=HEADERS)
    file_data = response.json()

    # Check if updated
    last_checked = get_last_checked_timestamp()
    if not last_checked or file_data['updated_at'] > last_checked:
        print("File was updated. Downloading...")

        # Download the file with progress bar
        download_url = file_data['download_url']
        download_with_progressbar(download_url, DOWNLOAD_FILE_NAME)

        # Extract the tar.gz file to blacklist.txt
        with tarfile.open(DOWNLOAD_FILE_NAME, 'r:gz') as tar:
            tar.extractall()
            os.rename('all.fqdn.blacklist', EXTRACTED_FILE_NAME)  # Assuming the file inside tar.gz is named all.fqdn.blacklist

        # Update the timestamp
        set_last_checked_timestamp(file_data['updated_at'])
    else:
        print("File has not been updated since last check.")

if __name__ == '__main__':
    main()

In this script:

Make sure to adjust the filename inside the tar.gz archive if it's different from all.fqdn.blacklist.