airbytehq / PyAirbyte-Hackathon

Tasks for PyAirbyte Hackathon June 2024
0 stars 2 forks source link

Blog post: Scraping web data from Apify source into Airbyte for Langchain #18

Closed aaronsteers closed 5 months ago

aaronsteers commented 6 months ago

Summary

Many Airbyte users want to scrape data from websites into their LLM models. The APIfy source can assist with this but not enough user guides are available as of now. The goal of this tutorial is to show users how to use Apify to scrape data, how to set up the Apify source using PyAirbyte, and then load the data into a vector store using Langchain.

Description

This tasks involves the following steps

Definition of Done

Blog post / python notebook. When providing a python notebook, please add a "What / Why / How" blurb at the top to explain what the code is doing.

Resources to Assist

vspanxcode commented 5 months ago

Hello @aaronsteers, I am interested in this issue, can you assign me this> Should I post blog on medium or hashnode?

marcosmarxm commented 5 months ago

It is yours! @Jeeesrw322 we're going to post in Airbyte blog but you can also post them in other sites too.

vspanxcode commented 5 months ago

@aaronsteers @bindipankhudi Do we need to use Pyairbyte and Langchain for this? I did not get the full picture. Can you explain this further?

bindipankhudi commented 5 months ago

@Jeeesrw322 i have added more details to the ticket. Hope it is clear now. Pls do not hesitate to reach out if you have more questions.

avirajsingh7 commented 5 months ago

@Jeeesrw322 if you have not started working on this issue, Can I take this one?, I have already worked on almost similar issue

vspanxcode commented 5 months ago

@avirajsingh7 Yes, you can take this issue

avirajsingh7 commented 5 months ago

@bindipankhudi please assign me this issue, I will be creating PR soon

bindipankhudi commented 5 months ago

@avirajsingh7 assigned to you.

bindipankhudi commented 5 months ago

Linking PR: https://github.com/airbytehq/quickstarts/pull/116