Grammar and slang normalization

nponeccop commented 1 year ago

Hello there,

I came across your project and I'm interested in contributing. Specifically, I need a tool to normalize IM chat transcripts and I think your project could be a good starting point.

My plan is to use ChatGPT's createChatCompletion() function to split the text into sentences, add punctuation, and fix domain-specific keywords. I'd like to know if you're open to accepting contributions like this and if you have any specific requirements or preferences regarding the implementation.

As a first step, I would like to:

Use one createChatCompletion() to split the text into sentences (chat logs often don't have that)
Feed the result into another createChatCompletion() to add punctuation
Feed the result into a yet another createChatCompletion() to fix domain-specific keyword ("by xxx we mean yyy").

Could you please let me know if this is something that you would like to see added to the project and if so, in what way? I'm happy to provide more details if needed.

Thank you for your time and consideration!

clean99 commented 1 year ago

Hi @nponeccop , Glad to hear from you! So you want to create a normalization feature for text to fix the typo, grammar errors, and things like that by using GPT right? I think that is a very good idea! I think this feature can definitely fit in this project, If you want to contribute, you are more than welcome. Or if you don't have time to contribute, you can tell me more about your requirement, and I can design an interface for you.

If you want to contribute, please provide:

A simple interface design, we can discuss together it.
After we have an interface design, I would like to have a simple technical proposal for that.
Once we are all ok with the design, let's just get started to implement it~

Or if you have any other work style preference, you can just feel free to work in the way you like.

Can just use this issue to write down anything, looking forward to hear from you

nponeccop commented 1 year ago

Hi. Here is my prototype code:

import { askChatGPT } from './gpt-api.js'
import { readPromptsYaml } from './prompts-yaml.js'

async function askToCorrectParameter (prompt, parameter) {
  const response = await askChatGPT([
    { role: 'user', content: prompt },
    { role: 'user', content: parameter },
  ])

  const regex = /^(?<status>CORRECTED|WAS_GOOD|UNABLE):\s*(?<message>.*)$/m
  const match = response.match(regex)

  if (!match) {
    return parameter
  }

  const { groups: { status, message } } = match // extract RegEx capture groups

  return status === 'CORRECTED' ? message : parameter
}

async function main () {
  try {
    const prompts = await readPromptsYaml('prompts.yaml')

    const correctedSentence = await askToCorrectParameter(prompts.splitSentences, prompts.sampleRequest1)
    const sentenceWithPunctuation = await askToCorrectParameter(prompts.punctuation, correctedSentence)
    const sentenceWithSlangCorrected = await askToCorrectParameter(prompts.slang, sentenceWithPunctuation)

    console.log(`Done! ${sentenceWithSlangCorrected}`)
  } catch (error) {
    console.error(error)
  }
}

await main()

As you can see, this method of using ChatGPT includes some ad-hoc things:

The protocol of CORRECTED|WAS_GOOD|UNABLE that ChatGPT is asked to follow in his answers in the prompts.
The protocol to parametrize a prompt by including an extra prompt
The YAML format to store the prompts.

So, I would like to standardize the low-level things like this that let you do something useful. I wonder if anyone else have already done this, and if not, we can try to make a framework to simplify the prompt engineering. The prompt engineering, an art of inventing prompts that ChatGPT follows reliably, is a huge area, and I would like to automate something and/or provide useful building blocks both for protocols and for chains of ChatGPT agents working on a task.

What do you think?

clean99 / gpt-prompt-kit

Grammar and slang normalization #2