aguschin / art-guide

Your guide in the world of art
MIT License
7 stars 2 forks source link

Scraping Descriptions for Paintings from DuckDuckGO #62

Closed michaelvin1322 closed 8 months ago

michaelvin1322 commented 10 months ago

Issue Description

We've noticed that approximately 2.5% of the paintings in our art guide project sourced from wikiart.org lack descriptions. To enhance the quality and usefulness of our project, we propose a solution to generate descriptions for these paintings automatically.

Problem

A significant portion of the paintings in our project doesn't have any description, which limits the information available to users.

Proposed Solution

We suggest using a combination of web search and an open-source Language Model (LLM) to generate descriptions for paintings with missing information. We have conducted an initial experiment to demonstrate the feasibility of this solution. You can view the experiment results here.

michaelvin1322 commented 8 months ago

I have made significant strides in addressing the issue of missing descriptions for paintings in our art guide project sourced from wikiart.org. Initially, we explored the option of utilizing the DuckDuckGo free REST API for generating descriptions, as it returns the most relevant results. I have implemented a Python script to parse the DuckDuckGo API, and you can find the code in my repository: scrapWikiArt, specifically in the duck_duck_go.py file.

Moreover, we have decided to break down the solution into two distinct tasks to streamline the process: parsing additional data and validating the parsed descriptions. This approach will enhance the efficiency and accuracy of our solution.

michaelvin1322 commented 8 months ago

To ensure comprehensive coverage, I'm extending the solution to include artists, movements, schools, and styles. The current status is as follows: