astronomer / ask-astro

An end-to-end LLM reference implementation providing a Q&A interface for Airflow and Astronomer
https://ask.astronomer.io/
Apache License 2.0
192 stars 47 forks source link

Add stack overflow ingest #181

Closed Lee-W closed 9 months ago

Lee-W commented 10 months ago

refactor https://github.com/astronomer/ask-astro/commit/47933b52dc8da905fdbb4fe1627d35f1254a98a7 and make it consistent with existing archive logic

closes: #126

cloudflare-workers-and-pages[bot] commented 10 months ago

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: 8e5e26d
Status: ✅  Deploy successful!
Preview URL: https://a2f2bd4e.ask-astro.pages.dev
Branch Preview URL: https://add-stack-overflow-ingest.ask-astro.pages.dev

View logs

Lee-W commented 10 months ago

Ingest the data from StackOverflow APIs and make sure the data from the archive is ingested as well from the APIs(as it is the dump of the Slackoverflow data.)

If that's the case, I'm unsure whether we should remove the archive. As we already have the archive data, shouldn't we just ingest that data, and use the API to parse data after that?

sunank200 commented 10 months ago

Ingest the data from StackOverflow APIs and make sure the data from the archive is ingested as well from the APIs(as it is the dump of the Slackoverflow data.)

If that's the case, I'm unsure whether we should remove the archive. As we already have the archive data, shouldn't we just ingest that data, and use the API to parse data after that?

We will create a new Weaviate class and do a fresh ingest. We don't need to remove archive data in old database

Lee-W commented 9 months ago

@sunank200 As https://github.com/astronomer/ask-astro/issues/194 has been done, should we re-review this one?