amosproj / amos2024ss08-cloud-native-llm

MIT License
6 stars 1 forks source link

Extract And Store Text Data From CNCF Project Webpages #56

Closed grayJiaaoLi closed 2 weeks ago

grayJiaaoLi commented 1 month ago

User story

  1. As a data engineer
  2. I want / need to extract the text of each page from CNCF projects into our dataset
  3. So that we can prepare enough training data for LLM

Acceptance criteria

Definition of done (DoD)

DoD general criteria