Pin4sf commented 4 months ago

[DMP 2024] Generate a project.yaml file from a list of steps #619

TL:DR - This service generates a workflow definition based on provided steps and outputs the result in either JSON or YAML format.

An AI agent that is able to generate a workflow for a user, based on an instruction specifying the required steps and adaptors.

As specified in the issue (https://github.com/OpenFn/kit/issues/619)

I have added the service gen_project which have a Python module to generate YAML/JSON files from workflow steps using NLP, rule-based parsing and NER model for adopter identification.

Deliverables

[x] A new service in the apollo repository called gen_project
[x] The service should take a series of steps as an input, and return a workflow definition as an output
[x] The workflow can optionally be defined as a project.yaml or workflow.json file
[ ] A demo script which allows the project to be imported to lightning.

This PR includes the first 3 deliverables completely. Still working a demo script which lets user easily test various inputs and see them visualised in Lightning.

Demo Script

Current demo script to use the gen_projectservice.

Usage

Send a POST request to /services/gen_project with a JSON body containing the workflow steps and the desired output format.

CLI Call

   poetry run python services/entry.py gen_project tmp/input.json tmp/output.json

make sure to replace tmp/input.json with the path to the input file and tmp/output.json with the path to the output file, or you can create tmp directory in the root of the project and run the command as is.

Details can be found in the README.md of this service.

Demo

A demo video showcasing cli call and service usage for reference.

https://github.com/OpenFn/apollo/assets/83578233/ef9770bb-6077-4aa3-a770-c41d522cc932

Pin4sf commented 4 months ago

A demo script which allows the project to be imported to lightning.

@josephjclark can you help me in explaining what exactly the script requirement is for files to be imported to lightning ?

As mentioned in the issue ticket (https://github.com/OpenFn/kit/issues/619). The demo should be executable from a terminal. It could be a python or bash script. It should not be integrated with the the OpenFn CLI.

The demo should do the following:

Read in a list of steps from a file, where each line of the file is one step Call out to the apollo service (locally, by default, but can take a URL) to generate a project.yaml Generate a Lightning project structure on the file system Deploy the project to demo.openfn.org

I want to know what exactly Lightning project structure on file system is because as per my understanding it's the project.yaml file itself and how can I deploy the project to demo.openfn.org .

Additionally, if you have any suggestions or improvements for this ?

josephjclark commented 4 months ago

Hi @Pin4sf

Your mentor should help you out with the review and moving the project forward from here on. Well be giving high-level guidance from a distance but I am not directly involved in the project.

We're just preparing some documentation to help you with the lightning import stuff. The first step is for you to able to import your generated yamls into Lightning.

Once you've got all that working, it's up to you to work out how to build a demo to allow a non-technical openfn contributor to generate and visualize a workflow (or maybe, if we have to, just visualize a pre-generated workflow). There are many ways to do this. A very simple CLI command or single script is acceptable.

Once you understand the deploy process, that should all make a lot more sense. And if you need more help at that point then we'll set up a call.

josephjclark commented 4 months ago

Instructions for setting up Lightning are coming soon - we've been very busy and struggling time to prepare material.

In the meantime, there are a LOT of files in this PR. Do we need them? Can they be bundled or compressed? What strategies can we employ to have less data in both the repo and the deployed docker image?

Pin4sf commented 4 months ago

Instructions for setting up Lightning are coming soon - we've been very busy and struggling time to prepare material.

Okay no problem thanks for updating me regarding this .

In the meantime, there are a LOT of files in this PR. Do we need them? Can they be bundled or compressed? What strategies can we employ to have less data in both the repo and the deployed docker image?

Yeah those files are for NER model which is used for adaptor identification. I have made the complete model available with AI agent so that it could work offline also but I will look into it try to find a way to have the amount of data reduced while still getting good YAML/JSON generation.

josephjclark commented 4 months ago

@Pin4sf ok, let's try and focus this week on making the ner model files more manageable. And if there really isn't an answer, what are the alternatives?

This may not be easy but it's important to us to have a viable solution. So let's take the time to get it right!

josephjclark commented 4 months ago

Hi @Pin4sf, how are you getting on with the model files?

I've got some resources for you on the demo setup.

We run a public demo version of our app at demo.openfn.org. You can log in with username editor@openfn.org and password welcome123. The demo resets at midnight every night (uh that's midnight UTC I assume). Go ahead and click around a bit, run a workflow, get a sense of what's up.

We want you to create a local demo - as a python or bash script, whatever you like, but it has to be easy to run - which will take a prompt (a list of workflow steps), generate a workflow yaml to disk, and publish that yaml straight to demo.openfn.org.

That makes it easy for both us AND you to actually visualise the workflows you're generating with the service.

@elias-ba has prepared a tutorial video showing you details of how to use our CLI to "deploy" a project.yaml to the demo app: https://www.loom.com/share/76db8ec7e4ab4d81b73c65d0019078f4?sid=9845e902-1ee5-4e8e-9e2d-e9650bc89f98

Pin4sf commented 4 months ago

Hey @josephjclark,

Thank you very much for the resources and updates regarding lightning import. This is really helpful.

How are you getting on with the model files?

I have reviewed the method I am using for file generation and found that my YAML/JSON structure generation primarily relies on rule-based parsing and regular expressions to assemble the file structure. Therefore, there was not much need for a specialized NER model for Adopter identification. Instead, adopters can be directly referenced from the list I had already updated in the codebase. We can use the existing NER model present in the root of the Apollo repository in the model directory.

This approach allows the AI agent to function the same as it did with a dedicated model.

josephjclark commented 3 months ago

Hi @Pin4sf, how are you getting on with the demo setup? I expect that to be where you're spending your time.

If you need review or feedback you should first let your mentor know. If there's no other direction you can contact me directly.

I've put together a couple of sample workflow descriptions for you, based on some real work we're doing at the moment. I think these are all quite hard in different ways - I'm not expecting perfect results (there isn't enough information), but I am hoping for a good starting point.

For every patient intake visit created in CommCare, create a FHIR Encounter and Observation and send to Satusehat. Trigger a workflow to send the encounter ID back to Commcare. This should use a Webhook trigger. Technically the creation steps should be Satusehat (because it uses FHIR), but if it uses the FHIR adaptor adaptor that's OK. The "trigger a workflow" bit should use the HTTP adaptor (the openfn adaptor would be cool and I'd be happy if it chose that).
At midnight every day, fetch visits from commare. For each visitor with an IHS number, create a FHIR Encounter in Satusehat. For every visitor without an IHIS number, lookup the number in satusehat and THEN create an encounter in satusehat. This is quite a hard one - ideally it would include branches for the two sets of visitors, and both branches would call the same Satusehat (or fhir) step to create the encounter. This should use a cron trigger (bonus points if a good cron value is set)
Whenever fridge statistics are send to you, parse and aggregate the data and upload to a collection in redis.. This is simple one - a webhook adaptor. Ideally it should generate one step for data aggregation with the common adaptor, and one redis step for the redui upload.
Generate a daily report of the average fridge temperatures by pulling out all records in redis for that day and averaging them. Send the results to my server at https://fridges.com/report. Another cron-based trigger. It needs a redis step to pull data, maybe a step to aggregate data (i don't mind if this is done in the redis step), and a http step to send the data to this random endpoint.

You might want to think about whether and how we can feed adaptor specific knowledge into the service. For example it would be useful to know that Satusehat can semd FHIR resources and webhooks are triggered with the openfn adaptor. How can we encode this information?

Pin4sf commented 3 months ago

Hello @josephjclark,

I am currently working on setting up a demo to be imported on Lightning. However, I am encountering an issue where, after deploying the project.yaml and exporting my API credentials to the demo app endpoint (following the video instructions), I am getting an authorization error.

Could you help me identify why the authorization is failing?

I am attaching the current midpoint update here.

There are a few doubts I have:

Output JSON and YAML Generation: Currently, the output.json is generated with the desired format specified in input.json. To import this generated file onto Lightning, I am following the portability specifications for project.yaml, which must be properly linted in UTF format for successful import. I created a new lint_yaml.py file that takes output.json as input and produces a proper project.yaml file that is Lightning import-ready. However, this requires calling a separate Python script. Is this approach okay, or should I make changes to streamline the process?
Webhook Trigger: Currently, our gen_project.py is set to use a webhook trigger by default. I need to rework the main generation logic to allow for a specific trigger based on the input steps. This will help accommodate cases like using a cron trigger instead of a webhook.
Adaptor-Specific Knowledge: I will also look into how we can feed adaptor-specific knowledge into the service. Could you provide a dataset of various input steps used by OpenFn users on a daily basis? This will help me figure out the adaptor-specific usage required for the service.

Thank you for giving me these test cases to work on. I will keep you updated on my progress.

josephjclark commented 3 months ago

Ah it turns out the 403 is my fault - try and sign in with super@openfn.org (same password).

I'm afraid I don't understand your linter (why do you need to lint a file you've just generated?) and I won't have time to look into it this week. If your web service returns the { files: { ['project.yam']: '' } structure, then @openfn/cli can extract the yaml and write it to disk, which is useful for us. But your python interface doesn't need to do that - maybe if you call gen_project.py from python directly, it can return a yaml string directly (or even write it to disk). You should do what you need to do - but please don't over-complicate it.

I cannot give you a dataset of input steps. It is too broad. Better to focus on the principle. How would you like to encode knowledge about adaptors? What options do we have? We do have a separate RAG service in development which allows documentation to be pulled from the doc site and added to a prompt. Would that facility be useful?

Let me put it this way: the knowledge base about our adaptors is docs.openfn.org/adaptors. Can you use that knowledge base to enhance the abilities of the generator?

Pin4sf commented 2 months ago

Hey @josephjclark Hope you are doing well. Sorry for late reply I had some university stuff. These are the current outputs can you please verify that these are the desired output you are looking for ?

At midnight every day, fetch visits from commare. For each visitor with an IHS number, create a FHIR Encounter in Satusehat. For every visitor without an IHIS number, lookup the number in satusehat and THEN create an encounter in satusehat

{"files": {"['project.yaml']": "workflow-1:\n name: Generated Workflow\n jobs:\n At-Midnight-Every-Day:\n name: At midnight every day\n adaptor: '@openfn/language-common@latest'\n body: '| // Add operations here'\n Fetch-Visits-Commare.-For-Each-Visitor-Ihs-Number:\n name: fetch visits from commare. For each visitor with an IHS number\n adaptor: '@openfn/language-common@latest'\n body: '| // Add operations here'\n Create-Fhir-Encounter-In-Satusehat.-For-Every-Visitor-Without-Ihis-Number:\n name: create a FHIR Encounter in Satusehat. For every visitor without an IHIS\n number\n adaptor: '@openfn/language-fhir@latest'\n body: '| // Add operations here'\n Lookup-Number-In-Satusehat-Then-Create-Encounter-In-Satusehat:\n name: lookup the number in satusehat and THEN create an encounter in satusehat\n adaptor: '@openfn/language-satusehat@latest'\n body: '| // Add operations here'\n triggers:\n webhook:\n type: webhook\n enabled: true\n edges:\n - source_trigger: webhook\n target_job: At-Midnight-Every-Day\n condition_type: always\n enabled: true\n - source_job: At-Midnight-Every-Day\n target_job: Fetch-Visits-Commare.-For-Each-Visitor-Ihs-Number\n condition_type: on_job_success\n enabled: true\n - source_job: Fetch-Visits-Commare.-For-Each-Visitor-Ihs-Number\n target_job: Create-Fhir-Encounter-In-Satusehat.-For-Every-Visitor-Without-Ihis-Number\n condition_type: on_job_success\n enabled: true\n - source_job: Create-Fhir-Encounter-In-Satusehat.-For-Every-Visitor-Without-Ihis-Number\n target_job: Lookup-Number-In-Satusehat-Then-Create-Encounter-In-Satusehat\n condition_type: on_job_success\n enabled: true\nname: open-project\ndescription: Auto-generated workflow based on provided steps.\n"}}

or every patient intake visit created in CommCare, create a FHIR Encounter and Observation and send to Satusehat. Trigger a workflow to send the encounter ID back to Commcare

{"files": {"project.yaml": "workflow-1:\n name: Generated Workflow\n jobs:\n For-Every-Patient-Intake-Visit-Created-In-Commcare:\n name: For every patient intake visit created in CommCare\n adaptor: '@openfn/language-commcare@latest'\n body: '| // Add operations here'\n Create-Fhir-Encounter-Observation-Send-Satusehat:\n name: create a FHIR Encounter and Observation and send to Satusehat\n adaptor: '@openfn/language-fhir@latest'\n body: '| // Add operations here'\n Trigger-Workflow-Send-Encounter-Id-Back-Commcare:\n name: Trigger a workflow to send the encounter ID back to Commcare\n adaptor: '@openfn/language-commcare@latest'\n body: '| // Add operations here'\n triggers:\n webhook:\n type: webhook\n enabled: true\n edges:\n - source_trigger: webhook\n target_job: For-Every-Patient-Intake-Visit-Created-In-Commcare\n condition_type: always\n enabled: true\n - source_job: For-Every-Patient-Intake-Visit-Created-In-Commcare\n target_job: Create-Fhir-Encounter-Observation-Send-Satusehat\n condition_type: on_job_success\n enabled: true\n - source_job: Create-Fhir-Encounter-Observation-Send-Satusehat\n target_job: Trigger-Workflow-Send-Encounter-Id-Back-Commcare\n condition_type: on_job_success\n enabled: true\nname: open-project\ndescription: Auto-generated workflow based on provided steps.\n"}}

Whenever fridge statistics are send to you, parse and aggregate the data and upload to a collection in redis

{"files": {"project.yaml": "workflow-1:\n name: Generated Workflow\n jobs:\n Whenever-Fridge-Statistics-Are-Send-You:\n name: Whenever fridge statistics are send to you\n adaptor: '@openfn/language-common@latest'\n body: '| // Add operations here'\n Parse-Aggregate-Data-Upload-Collection-In-Redis:\n name: parse and aggregate the data and upload to a collection in redis\n adaptor: '@openfn/language-common@latest'\n body: '| // Add operations here'\n triggers:\n webhook:\n type: webhook\n enabled: true\n edges:\n - source_trigger: webhook\n target_job: Whenever-Fridge-Statistics-Are-Send-You\n condition_type: always\n enabled: true\n - source_job: Whenever-Fridge-Statistics-Are-Send-You\n target_job: Parse-Aggregate-Data-Upload-Collection-In-Redis\n condition_type: on_job_success\n enabled: true\nname: open-project\ndescription: Auto-generated workflow based on provided steps.\n"}}

josephjclark commented 2 months ago

@Pin4sf We would like to verify the outputs by looking at them through Lightning. It is hard to debug the yaml directly.

How are you getting on with the demo? A good demo will allow you to prove to us that you've completed the task, and give us a reasonable chance at assessing the quality of your work. It is a vital part of the assignment, and should have been your focus on this project for the last six weeks.

If you are having difficulty (which wouldn't surprise me, it's not an easy assignment), who have you contacted for support?

Pin4sf commented 2 months ago

I am preparing for the demo itself. For queries I was discussing them with @SatyamMattoo and my mentor and googling other stuff and when required I was consulting here.

This was quite hard for me also as a college student and learning development I was facing various issues but got them resolved somehow and got to learn new things in the process.

josephjclark commented 2 months ago

Hey @Pin4sf learning when and where to ask for help is one of the most important dev skills of all. Just make sure to keep practicing it. If we can't answer, we won't! But usually we can at least give you a nudge in the right direction.

josephjclark commented 2 months ago

Thank you for your efforts here @Pin4sf . We'll take over the demo side of things and try and work out how to integrate this into our main app.

Pin4sf commented 2 months ago

Thank you very much to all the maintainers who helped me throughout my mentorship program. I've learned so much about open source development and feel much more confident in my abilities. It was a privilege to work on this issue, and I hope to contribute more to the project in the future!

OpenFn / apollo

Dmp/619 : Added gen_project service #83

[DMP 2024] Generate a project.yaml file from a list of steps #619

Deliverables

Demo Script

Usage

CLI Call

Demo