StarkWizard / cairo-llm

a set of notebooks for training llms on cairo
27 stars 8 forks source link

Feat doc scrap #11

Closed woolimi closed 10 months ago

woolimi commented 10 months ago

How to run

python 0-build\ datasets/cairo-scrapper/docScrap.py

What I've done

  1. config file for docScrap (docScrap.config.json)
  2. Scrap sites -> html -> change to md -> dividing by heading tag -> save files