Closed aaronsteers closed 4 months ago
I would like to take this if possible, this is a rest API type, right? is the response dynamic?
Yes it has a REST API interface and JSON output, Ref: https://jina.ai/reader/#apiform I would like to take this issue CC: @marcosmarxm, there is no previous assignee, I won't miss this time :man_dancing:
@btkcodedev - It's yours! Thanks for jumping in. I'm excited about this one for sure! Let @marcosmarxm or I know if you have any questions along the way!
Linking PR: https://github.com/airbytehq/airbyte/pull/39515
CC: @marcosmarxm @bindipankhudi @aaronsteers :bow: Thanks!!
Thank you @btkcodedev! Assigning to @aaronsteers for review.
@btkcodedev - This is looking awesome! I added a few comments + suggestions to the PR. 🚀
https://jina.ai/reader/
Overview
The most popular web scraping tool source connector right now is Apify. However, this new API from Jina is focused specifically on LLM use cases and it helpfully outputs markdown which is easy for humans and LLMs to work with. It also doesn't (yet) require a paid account.
The goal is to create a connector which could be used by Airbyte users to leverage this API.
Technical spec
You would write a new source connector which can connect to API and get the scraped content, allowing Airbyte users to send this data downstream to any Airbyte destination.
Notes:
Definition of Done