StampyAI / alignment-research-dataset

Stampy's copy of Alignment Research Dataset scraper
https://huggingface.co/datasets/StampyAI/alignment-research-dataset
MIT License
9 stars 7 forks source link

Add parsers and blogs #169

Open ccstan99 opened 1 year ago

ccstan99 commented 1 year ago

To handle suggestions from agisf:

Add to scrape entire blog:

Implement parsers for special_docs/indices:

ccstan99 commented 1 year ago

Would it help to use LangChain's WebBaseLoader as a default until the unimplemented parsers get implemented? https://python.langchain.com/docs/integrations/document_loaders/web_base

from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://epochai.org/blog/")
docs = loader.load()