ByMykel / spanish-cities

A library that provides data on Spain's autonomies, provinces, and cities, including their codes, names, flags, and coats of arms for seamless integration into your applications.
https://npm.im/all-spanish-cities
MIT License
1 stars 2 forks source link

Add the city's flag and coat of arms. #66

Open ByMykel opened 11 months ago

ByMykel commented 11 months ago

We have to add the city's flag and coat of arms.

If someone want to help just go to cities.json and look for cities with those attributes to null.

AloisSeckar commented 11 months ago

I would like to help with this task as well.

I will do something later today. Wanna try convincing ChatGPT to help with extracting the image data from wikipedia. Did some attempts yesterday with provincies as well, but he ended up distracted all the time and was returning wrong URLs.

ByMykel commented 11 months ago

@AloisSeckar That would be very cool cause there are around 8000 cities. I have added a couple but seems like a long task to do by "hand".

AloisSeckar commented 11 months ago

So I am trying, but it is not very effecive right now. It says it has to fetch each image separately and often asks for permission to proceed. It is working, but it is slow.

However, I have learned a few things we might use to create some "import script":

UPDATE: The linked list of flag images for cities in A Coruna province is surely incomplete. And the list of coat of arms have several duplicate entries. But it is at least something to start with.

AloisSeckar commented 11 months ago

I made a first version of custom web crawler to get the actual Wiki image URLs - https://github.com/AloisSeckar/wiki-image-crawler

So far it "only" retrieves the list of image URLs from Wiki category pages (example), but unlike ChatGPT, it does it quickly. I will try to improve it soon, so it will be able to fill the retrieved data directly to cities.json file.

ByMykel commented 8 months ago

Simple python script to list all the images:

# List of flags of municipalities:
# https://commons.wikimedia.org/wiki/Category:SVG_flags_of_municipalities_of_Spain_by_province

# List of coats of arms of municipalities:
# https://commons.wikimedia.org/wiki/Category:SVG_coats_of_arms_of_municipalities_of_Spain_by_province

import os
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

# URL of the Wikimedia Commons category
url = "https://commons.wikimedia.org/wiki/Category:SVG_coats_of_arms_of_municipalities_of_La_Rioja_(Spain)"

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

with open("output.txt", "w", encoding="utf-8") as file:
    gallery_boxes = soup.find_all('li', class_='gallerybox')

    for gallery_box in gallery_boxes:
        relative_image_url = gallery_box.find('img')['src']

        image_url = urljoin(url, relative_image_url.replace('/thumb/', '/'))
        image_url = os.path.dirname(image_url)

        file_name = gallery_box.find('a', class_='galleryfilename')['title']

        gallery_text = gallery_box.find('div', class_='gallerytext').text.strip()

        file.write(f"Image URL: {image_url}\n")
        file.write(f"File Name: {file_name}\n")
        file.write("\n")

output.txt:

Image URL: https://upload.wikimedia.org/wikipedia/commons/1/1b/Escudo_de_%C3%81balos_%28La_Rioja%29.svg
File Name: File:Escudo de Ábalos (La Rioja).svg

Image URL: https://upload.wikimedia.org/wikipedia/commons/d/d7/Escudo_de_Agoncillo-La_Rioja.svg
File Name: File:Escudo de Agoncillo-La Rioja.svg

Image URL: https://upload.wikimedia.org/wikipedia/commons/8/8c/Escudo_de_Albelda_de_Iregua-La_Rioja.svg
File Name: File:Escudo de Albelda de Iregua-La Rioja.svg

Image URL: https://upload.wikimedia.org/wikipedia/commons/4/4e/Escudo_de_Alberite-La_Rioja.svg
File Name: File:Escudo de Alberite-La Rioja.svg

Image URL: https://upload.wikimedia.org/wikipedia/commons/4/46/Escudo_de_Alcanadre-La_Rioja.svg
File Name: File:Escudo de Alcanadre-La Rioja.svg