OoriData / BSW4ClimateCon

BSW for ClimateCon!
https://bsw-4-climate-con.vercel.app
Apache License 2.0
4 stars 0 forks source link

refactor for async and smarter LLM use #21

Open choccccy opened 2 months ago

choccccy commented 2 months ago

I'm cooking on the Database stuff right now, and it's clear that there's a few things we can do to make the daily run much more efficient.

The searches, summaries, and action item generation are all done sequentially, which is slow (especially if we are working with more than just the 3 top search results like I am currently testing with).

This is compounded by our performing of the whole summary and action item generation before DB insert. We had discussed way back at the hackathon that we should probably insert the day's searches into the DB, then retrieve them and do LLM stuff. That would save a lot of undue processing. This one's not 100% straightforward, as I do think it is probably still useful to vectorize the news items by a single sentence summary instead of just using titles (which get truncated), but that can be a task done by a small, stupid LLM.

Despite all those complaints, we don't actually have that much to worry about time, since it can just run every day, slowly, and not be a big problem.

uogbuji commented 2 months ago

Yes, rather than a data pipeline, what we have now is lumpy data porridge. That's a legacy from the 20hr sprint that we'll want to refactor, as you say, once we have the properly MMF launched (imminent!).

choccccy commented 2 months ago

after some reconsideration and code reformatting, the "smarter LLM use" part of this is done. Async is still a "nice to have", but this is a low urgency task, since we have a long time to process between each "firing"