Daria-Oni / EcoHack-Babassu-bots

0 stars 1 forks source link

Prompt design #13

Open Daria-Oni opened 1 month ago

Daria-Oni commented 1 month ago

efficient prompt design to minimize hallucinations and potentially fabricated data from LLMs

how to track that the data is not fabricated?

2) assess performance of prompt

https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api

https://platform.openai.com/docs/guides/prompt-engineering

https://medium.com/nerd-for-tech/model-parameters-in-openai-api-161a5b1f8129

bossarda commented 1 month ago

Using pydantic to ensure output is in the desired format: https://xebia.com/blog/enforce-and-validate-llm-output-with-pydantic/

Daria-Oni commented 1 month ago

PROMPT:

Objective: Efficiently extract detailed information about ALL traditional medicinal species from a document and organize it into a structured JSON format.

Data to Extract:

Species Name: Record both the scientific and any common names. Medicinal Uses: Include descriptions of uses and the specific parts utilized. Location: Note the geographical location or habitat of each species. Citations: List any publications or studies that reference the species. Habitat Details: Provide information about the type of ecosystem where the species is found. Additional Data: Capture extra details such as preparation methods and dosages.

Example JSON Entry: [ { "Species": { "Scientific": "Heliotropium indicum", "Common": ["Indian heliotrope", "Tilpushpi"] }, "Use": "Used to treat wounds and ulcers; plant parts used: leaves and roots.", "Location": "Widespread in tropical regions", "Citation": "Journal of Ethnopharmacology, Vol 134", "Habitat": "Common in grasslands and open areas", "Additional Details": "Leaves are crushed and applied topically for skin ailments." } ]

Instructions:

Ensure each species entry in the document is converted into the JSON format. Focus on accurately transferring data without interpretation. Where specific details are unavailable, note "None" for that field. If no species data is present in the text, return an empty JSON array to indicate that no relevant information could be extracted. Keep entries consistent and comprehensive for database integration.

Daria-Oni commented 1 month ago

A file for testing: manually extracted table in a txt file from "Amazonian Brazilian Medicinal Plants Described by C.F.P. von Martius in the 19th Century." https://pubmed.ncbi.nlm.nih.gov/23500885/

Amazonian Brazilian medicinal plants described by C.F.P. von Martius in the 19th century.txt

Daria-Oni commented 1 month ago

prompt = f""" ROLE: You are a Data Extraction Specialist for Medicinal Species of South America. You have expertise in extracting detailed information about medicinal plants and animals from documents and organising it into JSON format. Your goal is to ensure accurate and comprehensive data for use in scientific research and database integration.

Respond to the following:

USER: Objective: Extract information about ALL medicinal species from a document and organize it into a structured JSON format. All text should be returned in English.

Data to Extract (ALWAYS maintain consistent variable names in the JSON output):

species_name: Record both the scientific and any common names.
medicinal_uses: Include descriptions of uses and the specific parts utilized.
location: Note the geographical location or habitat of each species.
citations: List any publications or studies that reference the species.
habitat_details: Provide information about the type of ecosystem where the species is found.
additional_data: Capture extra details such as preparation methods and dosages.

Instructions:
Keep entries consistent and comprehensive for database integration.
Ensure each species entry in the document is converted into the JSON format.
Focus on accurately transferring data without interpretation. Where specific details are unavailable, note "None" for that field.
If no species data is present in the text, return an empty JSON array to indicate that no relevant information could be extracted.

Text: \"\"\"
{text}
\"\"\"
"""