GuGuskyastro / Scientific-conference-title-disambiguation-system-based-on-GPT-and-Wikidata

This project uses langchain to formulate workflows for GPT and pre-creates WeaviateVS to store information for scientific meetings in Wikidata. Use GPT to parse the input text, compare the VS query results, and finally obtain the metadata of the conference extracted by GPT through Wikidata.
0 stars 1 forks source link

Integrate prompt template text in yaml file #1

Closed GuGuskyastro closed 1 year ago

GuGuskyastro commented 1 year ago

Currently, the prompt template text used to build the Agent is hard-coded in the code file. Try to replace it into yaml file for easy maintenance and modification, import it when needed.

fact_extraction_prompt = PromptTemplate(
            input_variables=["text_input"],
            template='''The following text entry is the citations section of a paper, which may contain one or more citations. 
            Some of these citations may contain information for scientific conference, you need extract conference title and short name. give the answer in the following format for each citation:

            properties: {{
            Citation text: {{"type": "string"}},
            Conference title: {{"type": "string"}},
            Conference short name: {{"type": "string"}}}}

            When extracting, do not appear "Proceedings of" in the Conference title. A simple example:Saleem, M., Mehmood, Q., Ngonga Ngomo, A.C.: FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework. In: Proceedings of 14th International Semantic Web Conference. pp. 52–69. Springer (2015),
            the Conference title for this citation then should be like International Semantic Web Conference 2015. The short name then is ISWC 2015. The year must be added to the title.
            References that do not mention scientific conferences can just use empty characters in the title and short name.
            Only use the information in the text to extract. If all citations do not have scientific meetings, tell the user that your input does not have any scientific meetings.

            text:\n\n {text_input}'''
        )
        fact_extraction_chain = LLMChain(llm=self.llm, prompt=fact_extraction_prompt)

query_Input_prompt = PromptTemplate(
            input_variables=["text_input"],
            template=''' I will give you two texts, one is the conference information I extracted, and the other is the query result of my database. Analyze whether the conference I extracted is also in the query result according to the queried short name and title, and answer me in template below:
            The {{conferen title}} is/is not stored in the Database.(If stored then add)Its QID is {{QID}}.

            Two text:{text_input}'''
        )

Also include the base template in the main file.

GuGuskyastro commented 1 year ago

The template text in the utils file is now integrated into a yaml file https://github.com/GuGuskyastro/Scientific-conference-title-disambiguation-system-based-on-GPT-and-Wikidata/commit/fc0d262d17d52864e7f2983e62e6cbb0921e358a

GuGuskyastro commented 1 year ago

https://github.com/GuGuskyastro/Scientific-conference-title-disambiguation-system-based-on-GPT-and-Wikidata/commit/39214f74c2891386934c51072f5c00d964a93c0f The template in the run function is also modified, now import from the template's yaml file.

GuGuskyastro commented 1 year ago

https://github.com/GuGuskyastro/Scientific-conference-title-disambiguation-system-based-on-GPT-and-Wikidata/commit/ec781780df1bfec7094a4203e154c35f9540ac7b https://github.com/GuGuskyastro/Scientific-conference-title-disambiguation-system-based-on-GPT-and-Wikidata/commit/e5aad0a2d612c35d52958d4bbc5ea5ed5fd496ea Modified the reading method of yaml path