airbytehq / PyAirbyte-Hackathon

Tasks for PyAirbyte Hackathon June 2024
0 stars 2 forks source link

RAG on Snowflake with Airbyte source data #2

Closed aaronsteers closed 5 months ago

aaronsteers commented 6 months ago

Summary

Build a RAG solution running on 100% Snowflake-managed infrastructure, using Airbyte.

Project Description

This broadly involves following steps:

Deliverable

Resources to Assist

Harmaton commented 5 months ago

I would love to work on this topic, Thank you :)

marcosmarxm commented 5 months ago

Assigned to you!

Harmaton commented 5 months ago

@marcosmarxm I have completed my documentation but I am using fake for the embedding model so i cant really get responses, can you provide me with a temporary openai api key just for the purpose of getting the responses in the blog ?

Harmaton commented 5 months ago

Here is my documentation , waiting for the api key to finalize. Thank you!

marcosmarxm commented 5 months ago

Thanks @Harmaton let wait for Bindi to check the API keys.

bindipankhudi commented 5 months ago

@Harmaton were you able to get a key from @marcosmarxm?

Harmaton commented 5 months ago

No not yet @bindipankhudi . However, I have completed the blog waitimg for review. :)

bindipankhudi commented 5 months ago

sent you a key in DM, @Harmaton. I will look at the blog post.

bindipankhudi commented 5 months ago

@Harmaton - actually for this issue, you can keep using fake embedding and vectorize data using Cortex function, so basically overwrite the embedding column in the snowflake by vectorizing the document_content column using cortex'x embed function.

bindipankhudi commented 5 months ago

also, @Harmaton when it's complete, if you could share the blog in a google doc that would be great! :)

Harmaton commented 5 months ago

Okay, noted. Is it good practice to use any vector embedding function on data I passed to snowflake cortex directly (where I had to embed using fake or openai?)?

Harmaton commented 5 months ago

Okay, noted. Is it good practice to use vector embedding on data I passed to snowflake cortex directly (where I have to embed uaing fake or openai?)? Or should I use either?

Harmaton commented 5 months ago

Final Submission @bindipankhudi

marcosmarxm commented 5 months ago

Thanks @Harmaton I'll Bindi to take a look!

bindipankhudi commented 5 months ago

Thank you @Harmaton. The blog post looks good! Thank you for making the updates. Could you share a picture and a quick bio for publishing on our side? Thank you! :)

Harmaton commented 5 months ago

BIO + IMAGE -> HARMATON @bindipankhudi