aicoe-aiops / fedora-mailing-list-analysis

This will be the repo for the Fedora mailing list sentiment analysis project
Other
0 stars 2 forks source link

Investigate the correct format of fedora mailing list archive - hyperkitty vs Mailman #1

Closed cdolfi closed 3 years ago

cdolfi commented 3 years ago

As a developer on the fedora mailing list analysis project, I want to develop a webscraping tool that is consistent with the available mailing list archive. The OSPO team provided me with the data in the Mailman format, and I built a webscraping tool around it. As per comment in operate-first/continuous-deployment#38-comment , I want to determine if that is consistent with how the emails are currently archived.

UPDATE: After analyzing the data it has been found that the initial scraping from the OSPO team from the archives to the CSV originally provided corrupted the format of the text. Therefore, this made the distinction between thread text and user text incredibly difficult because of the lack of consistency in format. Now, the correct way to scrape this data

Acceptance Criteria: