Open anuj-ti opened 1 year ago
Experiment: Scoring of individual answers with default map_rerank prompt
Description: When scoring individual answers using the default map_rerank prompt, several answers received a score of 100. However, this high score does not provide significant help in combining the answers.
Steps:
Expected Result: The scoring should provide more diverse and informative scores to assist in combining the answers effectively.
Actual Result:
Multiple answers received a score of 100, which does not contribute significantly to the process of combining the answers.
Following had a score of 100:
{ "intermediate_steps": [ { "answer": "Fargate was considered as an alternative to Devflows because it allows AWS experts to build solutions using Lambda functions, Step Functions and/or Fargate, which require coding, whereas DevFlows is a visual no-code environment.", "score": "100" }, { "answer": "Fargate was not considered as an alternative to Devflows.", "score": "100" }, { "answer": "Fargate was not considered as an alternative to Devflows.", "score": "100" }, { "answer": "Fargate was considered as an alternative to Devflows because it has the elasticity required, can scale to zero, and provides the relevant level of control for running actions as jobs.", "score": "100" }, { "answer": "To avoid the limitations of Lambdas and to provide message passing, orchestration and scale-to-zero serverless compute.", "score": "100" }, { "answer": "Fargate was considered as an alternative to Devflows because it allows an action to deliver its results by using additional infrastructure such as Fargate tasks, EventBridge and/or SNS notifications, or webhooks from external systems.", "score": "100" }, { "answer": "Fargate was considered as an alternative to Devflows because it could simplify the system by running all short-running invocables via Lambda and all long-running invocables on Fargate for EKS.", "score": "100" }, { "answer": "Fargate was not considered as an alternative to Devflows.", "score": "100" }, { "answer": "Fargate was considered as an alternative to Devflows because it allows for the execution of long-running tasks without the need for the flow developer to estimate execution time.", "score": "100" } ], "output_text": "Fargate was considered as an alternative to Devflows because it allows AWS experts to build solutions using Lambda functions, Step Functions and/or Fargate, which require coding, whereas DevFlows is a visual no-code environment." }
NOTE: Final answer in this case was acceptable, "Fargate was considered as an alternative to Devflows because it allows AWS experts to build solutions using Lambda functions, Step Functions and/or Fargate, which require coding, whereas DevFlows is a visual no-code environment."
Next Steps:
"answer": "Fargate was considered as an alternative to Devflows because it allows AWS experts to build solutions using Lambda functions, Step Functions and/or Fargate, which require coding, whereas DevFlows is a visual no-code environment; it has the elasticity required, can scale to zero, and provides the relevant level of control for running actions as jobs; it allows an action to deliver its results by using additional infrastructure such as Fargate tasks, EventBridge and/or SNS notifications, or webhooks from external systems; and it allows for the execution of long-running tasks without the need for the flow developer to estimate execution time."
Try a custom prompt for scoring in map operation Prompt used:
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
In addition to giving an answer, also return a score of how correctly it answered the user's question, be critical while scoring answers and also use the context of the question. This should be in the following format:
Question: [question here]
Helpful Answer In English: [answer here]
Score: [score between 0 and 100]
Begin!
Context:
Question: {question}
Helpful Answer In English:
Didn't have much impact, only some map operations got lower scores, we might have to refine the prompt or think of other approaches.
Experiment with 16K window:
Why was fargate considered as an alternative tool for Devflows?
[ "1edVCFrXMV78D2wjLfUeWGKNdy_cCEkDk76H5aQ1qbuo_1", "1Z7QTnSidQyhZj8wpSwub3MVj5Ik1iNJY3dKwnYL06kA_5", "1Z7QTnSidQyhZj8wpSwub3MVj5Ik1iNJY3dKwnYL06kA_2", "1Z7QTnSidQyhZj8wpSwub3MVj5Ik1iNJY3dKwnYL06kA_25", "1zRJNrbjm70i0EhqutiSL_CYxJXhgUOpkKoMVCshn3io_8", "1IJ4zHXQSNMJU6sAHgI7QpRnnIJNQBToyHTkFFc4KW0E_10", "1edVCFrXMV78D2wjLfUeWGKNdy_cCEkDk76H5aQ1qbuo_15", "1kNV2pxePiGPlBBNVBt__kWHA6sObZzdQM2vE76ajkLY_22", "1O15XpUOBmGbvE_WKP0-DrUt0QJLjkn43fVV1L00HPYE_6", "1zRJNrbjm70i0EhqutiSL_CYxJXhgUOpkKoMVCshn3io_10", "1kNV2pxePiGPlBBNVBt__kWHA6sObZzdQM2vE76ajkLY_14", "1edVCFrXMV78D2wjLfUeWGKNdy_cCEkDk76H5aQ1qbuo_13", "1zRJNrbjm70i0EhqutiSL_CYxJXhgUOpkKoMVCshn3io_3", "1zRJNrbjm70i0EhqutiSL_CYxJXhgUOpkKoMVCshn3io_4", "1yRygapujyzQF_D5U9a-Y8KDt3aCmofT28a2moxnwOdk_5", "19kNkeF9-9HUoGUSXI29nW0vUk0zxkUQj29SmA9EiKwQ_13"]
ASCIIdoc
formatMap reduce chain wasnt able to answer following question even with all relevant chunks present at hand:
Both map and reduce step did poor job and finally the answer was terrible, see intermediate steps below
16-K window in query-pipeline
Why was fargate considered as an alternative tool for Devflows?
was fairly complex as there was a lot of information available on fargate being used in devflows or being considered to be used(as alternative) in devflows. That question gives a bad answer in both approaches.
Question: 'what aws technologies you know?' Answer: "I don't know."
Chunks:
Intermediate Steps:
In the chunks, there was a paragraph containing the information: Use the Kubernetes Control Plane for deployments.\n\n|\n|THE PROBLEM |How to deploy DevFlows flows?\n\n|OPTIONS CONSIDERED +\n(Decision in bold) a|\n[arabic]\n. {blank}\n+\n\nUse
AWS services, including Lambda, SNS, etc
.\n\n. {blank}\n+\n\nUse the Kubernetes Control Plane for deployments.\n\n\n|REASONING a|\nThe team initially tried to use the AWS APIs but had to deal with the\ncomplexities ofAPI Gateway, SNS, SQS, Lambda and IAM rules
. They kept\nrunning into problems, and the original DevFlows was a Proof of Concept\nand needed to be done quickly.This information was not extracted. Loss of information occurs at the intermediate steps only