Validate and Refine Descriptions parsed from DuckDuckGo API

michaelvin1322 commented 9 months ago

In continuation of our efforts to enhance the quality of generated descriptions for paintings without information (#62), we are now addressing concerns related to the occasional retrieval of irrelevant data from the DuckDuckGo API.

Solution Strategy:

To mitigate the issue, we propose the following steps:

Filter by Link Occurrence: Implement a filter to assess the relevance of the fetched data. If a link to a wiki article occurs more than once, we can reasonably assume that the article is not specific to the intended painting.
Binary Classification with Mistral: Utilize the open-source Language Model (LLM) Mistral for binary classification. This step aims to determine the relevance of the retrieved articles for the respective art pieces.

michaelvin1322 commented 8 months ago

Percentage of missing descriptions before and after new desctriptions.

	Painting Description Missing	Painting and Artist Description Missing	All Descriptions Missing (Including Art Movements, School)
Initial Data	0.975676	0.0909668	0.00533121
Updated Data	0.952814	0.0807473	0.00524049

michaelvin1322 commented 8 months ago

In our ongoing efforts to enhance the quality of generated descriptions for paintings without information, we have successfully tackled concerns related to occasional irrelevant data retrieval from the DuckDuckGo API. The following solution strategy has been implemented:

Solution Strategy:

Filter by Link Occurrence: We introduced a filter to assess the relevance of the fetched data. By analyzing link occurrences, we identified instances where a link to a wiki article occurred more than once. This allowed us to reasonably conclude that the article was not specific to the intended painting.
Binary Classification with Mistral: Leveraging the open-source Language Model (LLM) Mistral, we employed binary classification to determine the relevance of the retrieved articles for the respective art pieces.

Utilized Prompt for Mistral Model:

Review the following information about a /*painting*/:
        /*known fields*/
        And its DuckDuckGo-sourced description:"
        WikiDescription: /*WikiDescription*/

        Is the WikiDescription accurate and relevant to this /*painting*/? Answer with 'Yes' or 'No' only.

Handling Model Responses: We developed a response processing function to interpret Mistral's outputs. The logic is as follows:

If the response contains 'yes', interpret as True.
If the response contains 'no', interpret as False.
For unexpected and empty responses, return None.

Model Performance: The Mistral model returns interpretable results (True or False) in 77% of cases. Notably, 40.0% of new data represent valid updates for our database.

aguschin / art-guide

Validate and Refine Descriptions parsed from DuckDuckGo API #88

Solution Strategy: