anuj-ti / report

0 stars 0 forks source link

Loss of info when asked for list with dates #4

Open anuj-ti opened 1 year ago

anuj-ti commented 1 year ago
id : 6c3f2935461e1ac4ba3016d21f7448e7 

Question: "give me a numbered list with dates about decisions on devflows?" Answer: There is no information available about decisions on devflows.

Chunks:

[
    "1iU0UTCEWlTTBsi19Xr2SJhMLRVUnOA0eI6KHljkpJ7s_23",
    "1xiop33iVLZn8qXcgRcV37kkNHLRf6FNau3AY4gPuy-4_7",
    "1csUA15zcccn2bnbuQ3FaPCPa1_gB2VT6719nAkS5OgY_19",
    "1xiop33iVLZn8qXcgRcV37kkNHLRf6FNau3AY4gPuy-4_0",
    "1qlDAIBVpuxHgkYdxmOqwnawgvNIPNgCit6lJCRTVVCA_11",
    "1yRygapujyzQF_D5U9a-Y8KDt3aCmofT28a2moxnwOdk_7",
    "1xiop33iVLZn8qXcgRcV37kkNHLRf6FNau3AY4gPuy-4_5",
    "1xqaJBe2vwFAOKOU7sirl1YWdlSXO0KWzz2MwkUz8yMg_12",
    "1shdmbPueAlbX4LTNF5ZGtxy4mpNBITssuimvzG-hcHY_4",
    "1xiop33iVLZn8qXcgRcV37kkNHLRf6FNau3AY4gPuy-4_13",
    "1zRJNrbjm70i0EhqutiSL_CYxJXhgUOpkKoMVCshn3io_2",
    "1qlDAIBVpuxHgkYdxmOqwnawgvNIPNgCit6lJCRTVVCA_2",
    "116n7AS1mLoSjrIZg_Z481oUYvEv-UaVnHD8HN38CJsc_12",
    "1T2x0nQx_JIobD74rYGA6EkX-wSSTyH0D5JcAtzoq0ZE_1",
    "16BnhyrzMAQj2q9Eli-CnW5GNO5vsIYsyJ8DhOfP6QMI_7",
    "1VvY9xJATTjOlv5Asz8OmjmG07bpTsbqD3AM8hBzbIQw_8",
    "1O15XpUOBmGbvE_WKP0-DrUt0QJLjkn43fVV1L00HPYE_7",
    "12J_waJEbC1EEBL3MVy_RqyOWfh6ZslV45b-csN4B7Bs_3",
    "1Ef74MMXRscdiFakonFnx0ESc0YXLgUd214Ic13wAwzc_8",
    "1T2x0nQx_JIobD74rYGA6EkX-wSSTyH0D5JcAtzoq0ZE_16"
]

Intermediate Steps:

    " No relevant text.",
    " None.",
    " No relevant text.",
    " No relevant text.",
    "
    No relevant text.",
    " None",
    "
    * {blank}
    +
    ____
    YYYY.MM.DD First Last <first.last@trilogy.com>
    ____",
    " No relevant text.",
    " No relevant text.",
    " No relevant text.",
    " No relevant text.",
    " No relevant text.",
    " None.",
    " No relevant text.",
    " None.",
    " N/A",
    "
    No relevant text.",
    " No relevant text.",
    " None",
    " No relevant text."

The reduce prompt is not able to extract the information.

Same question asked without date

id: 7f0bff80312defed4dee643ae2be634c

Question: "give me a list of decision on devflows?" Answer:

Chunks:

[
    "1csUA15zcccn2bnbuQ3FaPCPa1_gB2VT6719nAkS5OgY_19",
    "1iU0UTCEWlTTBsi19Xr2SJhMLRVUnOA0eI6KHljkpJ7s_23",
    "1xiop33iVLZn8qXcgRcV37kkNHLRf6FNau3AY4gPuy-4_7",
    "1zRJNrbjm70i0EhqutiSL_CYxJXhgUOpkKoMVCshn3io_2",
    "116n7AS1mLoSjrIZg_Z481oUYvEv-UaVnHD8HN38CJsc_12",
    "1xiop33iVLZn8qXcgRcV37kkNHLRf6FNau3AY4gPuy-4_5",
    "1qlDAIBVpuxHgkYdxmOqwnawgvNIPNgCit6lJCRTVVCA_11",
    "1yRygapujyzQF_D5U9a-Y8KDt3aCmofT28a2moxnwOdk_7",
    "1xiop33iVLZn8qXcgRcV37kkNHLRf6FNau3AY4gPuy-4_0",
    "1z68T6Wn9iwduH_RAvCtN5VWKuHANkJQW0ksOweXVtfI_4",
    "1Ef74MMXRscdiFakonFnx0ESc0YXLgUd214Ic13wAwzc_8",
    "1xqaJBe2vwFAOKOU7sirl1YWdlSXO0KWzz2MwkUz8yMg_12",
    "1shdmbPueAlbX4LTNF5ZGtxy4mpNBITssuimvzG-hcHY_4",
    "12J_waJEbC1EEBL3MVy_RqyOWfh6ZslV45b-csN4B7Bs_3",
    "1qlDAIBVpuxHgkYdxmOqwnawgvNIPNgCit6lJCRTVVCA_2",
    "1JMzgq_PmHXLuq_koO2-p2Xe1_daojWb9KRpOTni1PP8_3",
    "1bztAP-tkfhTkUmbQmL4c9cFIfIhsGZ3dB5v9aWQCbaQ_2",
    "1T2x0nQx_JIobD74rYGA6EkX-wSSTyH0D5JcAtzoq0ZE_17",
    "1edVCFrXMV78D2wjLfUeWGKNdy_cCEkDk76H5aQ1qbuo_4",
    "16BnhyrzMAQj2q9Eli-CnW5GNO5vsIYsyJ8DhOfP6QMI_7"
]

Intermediate steps:

"

    Flow Wizards will do nothing more than generate a flow consisting of a handful of connected Pattern Flows. These are DevFlow’s highest-level nodes and implement common patterns used to create AWS applications. For example, we may have a “Salesforce Data Sync” node that could be connected to “S3 data lake” or “Redshift data lake” nodes.",
    " 
    New entities for templates and template instances. Previously flows existed. Note that sub flows exist in the DevFlows back end but are currently missing from the new DevFlows v2 UX.
    New operation for template change publishing.
    Store in the DevFlows MySQL db.
    Templates will increase the number of active flows.",
    " 
    ITD 5 - Only 408, 429 and 503 error codes should result in retries.",
    " 
    No - global context only 
    Yes - Context with scope levels: node only, flow only, or global.
    No
    Yes - switch node",
    " None",
    " 
    *ITD 2 - Support data ingestion in batches using DevFlows batch capability*
    *OPTIONS CONSIDERED + (Decision in bold)* 
    *Support data ingestion in batches using DevFlows batch capability*
    *REASONING* 
    Option (2) will lead to one ElasticSearch update request for every document. This leads to poor performance and ElasticSearch recommends https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing-speed.html#avoiding single-document index requests.
    Instead, we use the batching support provided in this https://docs.google.com/document/d/1hRMaz_J4dUfWQ7L84IZthrn5xaUjdpiv8nPlgcoR4AU/edit#heading=h.u16xk7bmvdsz#spec.
    Batch support in DevFlows is provided by placing a SQS queue in front of every node proxy and sending the events to the Lambda function implementing the action only once the batch is ready. The ElasticSearch adapter can then process all these events and ingest the data into ElasticSearch",
    " 
    A variable can be declared as a string, a number, a boolean, an object or an array 
    The JSON type in DevFlows can map to any valid DynamoDB type, including null 
    List is a common, everyday concept that everybody understands 
    Number is another common concept that can be understood by anyone 
    Strings and booleans do not need their own type, because there are no special operations required for these types, so they can be stored using the JSON type.",
    " 
    We should implement the *s3-receive-events* INPUT node in *M1* first because it will enable S3 event-based invocation of DevFlows workflows, which will likely be one of the most popular use cases for the S3 Adapter.

    Milestones *M2*, *M3*, and *M4* can be implemented in parallel.",
    " 
    *Provide a lesson for each assignment explaining the relevant concepts required to build the flow*",
    " 
    *Provide full access to the Workspace account via the Workspace Profile*
    Only support configurable UIs (field name, label, type, etc) for Connector Profiles",
    " 
    Previously sub-flows could be any flow that was published in the system. There was no explicit input/output schema enforced for sub-flows. Sub-flows worked more like macros, in that they were unraveled into the execution environment, than subroutines. Because a flow developer could change any configuration in a subflow, it was necessary for someone using a subflow to truly understand it. It’s analogous to the user of a function in C needing to know the entire implementation versus just the hardened interface.",
    " 
    Just rely on the call-subflow action.
    Deploy a differently named instance of call-subflow for each flow we need to call.
    Create symbolic links so subflows can be named like any other action.",
    " 
    Bet fully on AWS - Lambda for serverless compute and SNS or EventBridge (needs a deeper dive with AWS) for eventing. 
    Move the programming metaphor for Actions (invocables) to be functions rather than REST endpoints. 
    Abandon the PaaS Tech Diff of being able to run any REST micro-service inside of DevFlows. 
    Instead Lambda wrappers could be used to call a REST service hosted by the invocable developer, who then needs to own and operate this service.",
    " 
    Force the flow author to explicitly upgrade the action version
    Upgrade the flow to the latest compatible PATCH release but don’t redeploy
    Upgrade the flow to the latest compatible PATCH release and redeploy
    *Upgrade the flow to the latest compatible PATCH release, and redeploy, but give the flow developer control*",
    " We are betting on AWS and will base DevFlows there. Because users won’t care about deployment decisions, there is no rush to worry about other cloud providers. But companies use many popular SaaS products such as Salesforce, ZenDesk, Workday, GoogleDocs, etc. DevFlows will make it easy to stitch together these systems to provide powerful workflows. We will integrate with AWS AppFlow and EventBridge where connectors are available, consistent with the Cloud Integration Hub Roadmap. Where there are no AppFlow connectors, we will connect directly from DevFlows, with a goal to convert to AppFlow connectors later.",
    " 
    Don’t worry about current users
    Use feature flags
    Branch each repo
    Fork the codebase",
    " 
    DevFlows is designed for adapters and invocables to be easily defined and added. 
    There is no integration between QuickBooks and AppFlow. 
    There is a comprehensive webhooks feature in QuickBooks, but it has some problems that prevent it being useful in a DevFlows adapter. 
    However, there is also a Change Data Capture (CDC) mechanism accessible via the API, which allows us to very simply poll repeatedly for changes. 
    Altogether, this was a very straightforward adapter to design and it should be simple to build. 
    There is one part of this spec that seems unnecessary, which is provisioning storage to record the latest timestamp that the CDC operation was successfully executed. 
    DevFlows could provide actions with simple persistence, similar to window.localStorage in the web browser, via the context parameter.",
    " 
    Make the Connector Profile a first-class citizen in DevFlows.
    Separating the definition of Connector Profile Schemas from creating an Adapter.
    Creating a separate user interface for flow authors to configure Connector Profiles.
    Providing access to the Workspace AWS account by default.",
    " 
    DevFlows is Visual Basic for the cloud, allowing integrations to easily be built. Currently complex flows can be built, but there are common sections of flows, which repeat in different places. Just as regular programming languages have subroutines, DevFlows needs subflows.",
    " No relevant text."

Most of the extracted chunks are the same:

intersection = [
    "1csUA15zcccn2bnbuQ3FaPCPa1_gB2VT6719nAkS5OgY_19",
    "1iU0UTCEWlTTBsi19Xr2SJhMLRVUnOA0eI6KHljkpJ7s_23",
    "1xiop33iVLZn8qXcgRcV37kkNHLRf6FNau3AY4gPuy-4_7",
    "1zRJNrbjm70i0EhqutiSL_CYxJXhgUOpkKoMVCshn3io_2",
    "116n7AS1mLoSjrIZg_Z481oUYvEv-UaVnHD8HN38CJsc_12",
    "1xiop33iVLZn8qXcgRcV37kkNHLRf6FNau3AY4gPuy-4_5",
    "1qlDAIBVpuxHgkYdxmOqwnawgvNIPNgCit6lJCRTVVCA_11",
    "1yRygapujyzQF_D5U9a-Y8KDt3aCmofT28a2moxnwOdk_7",
    "1xiop33iVLZn8qXcgRcV37kkNHLRf6FNau3AY4gPuy-4_0",
    "1Ef74MMXRscdiFakonFnx0ESc0YXLgUd214Ic13wAwzc_8",
    "1xqaJBe2vwFAOKOU7sirl1YWdlSXO0KWzz2MwkUz8yMg_12",
    "1shdmbPueAlbX4LTNF5ZGtxy4mpNBITssuimvzG-hcHY_4",
    "12J_waJEbC1EEBL3MVy_RqyOWfh6ZslV45b-csN4B7Bs_3",
    "1qlDAIBVpuxHgkYdxmOqwnawgvNIPNgCit6lJCRTVVCA_2",
    "116n7AS1mLoSjrIZg_Z481oUYvEv-UaVnHD8HN38CJsc_12",
    "1T2x0nQx_JIobD74rYGA6EkX-wSSTyH0D5JcAtzoq0ZE_17",
    "1Ef74MMXRscdiFakonFnx0ESc0YXLgUd214Ic13wAwzc_8",
    "16BnhyrzMAQj2q9Eli-CnW5GNO5vsIYsyJ8DhOfP6QMI_7"
]

We are losing a lot of information in the reduce step

anuj-ti commented 1 year ago

Similar problem with another execution.

question: Give me a bullet point list of all decisions to use Aws Lambda to build Devflows with the date on which the decision was made.
answer: There is no bullet point list of all decisions to use AWS Lambda to build Devflows with the date on which the decision was made.
id: f557e747218b445390ed428e3c7b5051

Comment: Retried the same question and got proper answer
answer: 
Decisions to use AWS Lambda to build Devflows were made on the following dates:
- On February 22, 2021, it was decided to move to Lambda and build DevFlows Actions so they can be deployed as Typescript functions in Lambda with the proxy functionality built into a Lambda Layer (source: 2).
- On the same day, it was decided to use AWS services, including Lambda, SNS, etc. (source: 3).
- On February 26, 2021, it was decided to use AWS Lambda global variables (source: 6).
- On February 22, 2021, it was decided to use Lambda and Fargate ECS (source: 9).
- On February 22, 2021, it was decided to change the programming model to be an API model in TypeScript, deploy in Lambda, and handle the DevFlows platform work in a Lambda layer (source: 11).
- On February 10, 2021, it was decided that moving off of Knative and onto Lambda would help (source: 20).
id: 66e1511b1e24783553a9a124e503ef3b

Again, chunks are there but the information was not extracted.

Tried the above question without asking for dates and it did a good job

question: Give me a bullet point list of all decisions to use Aws Lambda to build Devflows
answer: 
- IT developers, Growth Engineers (in Marketing), FP&A Analysts who use VBA and anyone else who develops scripts and code to solve problems can use DevFlows to quickly build applications by stitching together SaaS applications and AWS services in an easy, visual way.
- AWS experts could build solutions using Lambda functions, Step Functions and/or Fargate.
- Rather than try and optimize sidecars, I would build DevFlows Actions so they can be deployed as Typescript functions in Lambda with the proxy functionality built into a Lambda Layer.
- This goes away with the move to Lambda.
- The abstraction is a REST endpoint per Action (where each node can have an action).
- The design of actions being http servers means that to run on Lambda, the abstraction will need to change and every action will need to be updated to be compatible.
- This design works for any node which is a HTTP server. It does not fit with a mode where we want to deploy in Lambda.
- We need a higher level abstraction that can be applied either to deployment of containers or serverless Lambda functions.
- The CN rebuild of invocables to work on Lambda introduced the concept of “functions”.
- Use AWS Lambda global variables
- DB connection pooling using lambda global variables is easy to implement and, with sensible defaults, can scale to 100+ concurrent lambda invocations.
- ScaleArc provides connection pool support but also requires management overhead.
- A DevFlows deployment would manually require setting up a ScaleArc cluster and performance tuning.
- Additionally, a rough benchmark between a ScaleArc cluster (with 1 server and default settings) vs. using lambda global variables indicated that ScaleArc generated significantly higher error rates (i.e. connection timeouts).
- Using these libraries requires the Lambda code to be written using Node.js.
- This spec uses a lambda function to query an RDS RDBMS based on the provided connection settings.
- It provides data to the flow in an event stream of 1 row per event.
- The system uses Lambda global variables and open-source libraries to support connection pooling.
- It only supports MySQL and PostgreSQL based on availability of open-source libraries, but other databases can be supported by extending these libraries relatively easily.
- The change to run actions in Lambda in preference to Knative Serving
- Execution duration is longer than 15 minutes
- Memory requirement is greater than 10GB
- CPU requirement is greater than 6 vCPU
- I/O requirements are very high
- Support data ingestion in batches using DevFlows batch capability
- Batch_Size is set and controlled at the platform level but each action can set an upper limit on Batch_Size, based on the largest batch it can handle.
- The default value for Batch Window is 60 seconds (1 min), configurable for each workspace.
- Use __bulk_ in data ingestion requests for bulk update.
- Build a scheduled Lambda function, triggered by EventBridge rules, to poll feed destinations.
- Prioritize Lambda implementations.
- Make long running applications be separately deployed outside of the core of DevFlows, but allow communication with them by writing Lambda wrappers that can make the external HTTP requests.
- Have DevFlows actions call an API to generate events.
- The approach though the team is taking with the introduction of Lambda support is to make each proxy be a Lambda function, which then synchronously calls the action as a second lambda function.
- Change the programming model to be an API model in TypeScript, deploy in Lambda, and handle the DevFlows platform work in a Lambda layer.
- Using this model the DevFlows platform would be free to combine multiple Actions into a single Lambda function as an optimization.
- Bet fully on AWS - Lambda for serverless compute and SNS or EventBridge (needs a deeper dive with AWS) for eventing. This transformation is currently being executed by the DevFlows Code Nation team with a focus on Lambda and SNS, though the plan is to keep Kubernetes also.
- Move the programming metaphor for Actions (invocables) to be functions rather than REST endpoints. This would make it easier to run the proxy logic in the same instance as the Action, would allow us to optimize and combine different functions into a single runtime activity and would standardize our Action development on a single language (Typescript). The Lambda change currently occurring will add a function paradigm, but will preserve the existing approach in parallel.
- Re-platforming from Knative eventing to AWS SNS and from Knative serving to Lambda functions.
- Schedule time with AWS to get deep on whether SNS or EventBridge make the most sense here.
- Reportedly the EventBridge team said we should use their service over SNS, but the SNS team may say the exact opposite.
- Use existing 429 handling and exponential backoff to throttle resource consumption.
- Constrain AWS resource usage at certain levels.
- Batch events when they backup.
- Allow adapters to process batches.
- Use AWS services - AppFlow, Step Functions, EventBridge, Lambda, etc - to build their own data flows.
- DevFlows provides a simple UI and abstracts all the plumbing away from the user, automating provisioning and monitoring.
- Using AWS services directly requires a much more technical user who can handle these concerns.
- DevFlows has recently undergone a major change from Knative to Lambda functions and SNS topics to provide the runtime and inter-node communications.
- Automatically determine and use the best execution environment for each action in a flow (i.e. Lambda vs Kubernetes), to provide scalability while avoiding the execution time hard limit of Lambda.
- Actions are packaged as Docker containers, so that they are portable between both Kubernetes and Lambda.
- Use a Proxy Lambda function to provide a standard interface to actions via an SNS topic for passing events between nodes.
- Expose public HTTP endpoints using API Gateway, and use AWS services for other supporting functionality.
- Use INPUT nodes to receive pushed events into flows
- Call AWS APIs directly to create resources needed for an input node
- Include only a Lambda Listener
- Create SQL Actions, which will include drivers to the most common databases - such as Oracle, MySQL, SQL Server, and PostgreSQL
- DevFlows currently uses one single AWS account for all workspaces in a given installation.
- Our throttling approach will reduce concurrent Lambda invocations through batching and reducing the number of proxies.
- Our AWS Accounts approach will allow each workspace to have its own AWS Account, reducing the noisy neighbor problem of one flow interrupting others.
- In the event that concurrency limits are still reached under load, the primary concern is http requests from external sources that may not have retry logic.
- API limits are a normal way of handling these types of increases in API level access.
id: 75f927438ecad499aa605a3a45d7ba3b