I want / need to include Stackoverflow questions related to CNCF in my Q&A dataset as they are already kind of in a Q&A format
So that my Q&A dataset contains more content
Acceptance criteria
Data of Stackoverflow entries regarding CNCF content is retrieved (f.e. from a Hugging Face dataset or from websites, as you like)
The data is stored into the Q&A dataset
Not every StackOverflow entry is in our dataset, just entries that are related to CNCF content
less is more - it is better to leave out data that would be potentially helpful than to blow our dataset up with content that does not provide informations regarding our use case
either a dataset regarding CNCF is found and used or a heuristic is used to identify which content is relevant for us (like f.e. an entries title contains word "CNCF", "Kubernetes" etc.)
optional - only include entries with accepted answers (as they might provide some better quality)
Further information
The data does not need to be loaded into the raw dataset if the data is already in the required Q&A shape (in Q&A pairs) and can be directly added to the Q&A dataset. Of course adding it to the raw dataset is also good, but this is optional
User story
Acceptance criteria
Further information
Definition of done (DoD)
DoD general criteria