Closed maoo closed 4 years ago
Travis CI build will probably need to add support for Chrome headless run - https://docs.travis-ci.com/user/gui-and-headless-browsers/
Main code structure have been altered to allow Selenium to fetch the HTML body ; rest of the logic will be altered accordingly, some selectors will probably have to change.
See https://github.com/finos/metadata-tool/commit/5d83d42f72d9a86479749efb4d91027154befa74
Hi @pmonks , thanks for the feedback; unfortunately clj-http doesn't support (AFAICS) Javascript execution; as I already had experience with Selenium (as I helped on https://github.com/7bridges-eu/shelob ), I used that one.
Solving issue, as this is what we delivered on https://github.com/finos/metadata-tool/pull/48
Bug Report
The Confluence user API have been deprecated, in June 2019; since then, the FINOS meeting attendance tracking is not able to crawl Confluence data, leading to the build to constantly fail.
Steps to Reproduce:
lein run -- gen-meeting-roster-data -m ../metadata
Expected Result:
Generate a file
finos-meetings.csv
in the root folder containing FINOS meeting attendance crawled from https://finosfoundation.atlassian.net/wikiActual Result:
finos-meetings.csv
is empty. The logic also fails with an HTTP 400 error when calling the Confluence user API, leading to NPE in theparse-string
function.Proposed fix
Use selenium to crawl public HTML, extract the full name of attendees and match it against the FINOS person Metadata
fullName
field.Note that this logic shall apply only for the part of logic that parses a meeting page, not the logic to browse Confluence tree and identify meeting pages.
Current work
Added Selenium dependencies to https://github.com/finos/metadata-tool/tree/confluence-selenium-crawler and setting up development of the new feature.