Work around Confluence user API deprecation

maoo commented 5 years ago

Bug Report

The Confluence user API have been deprecated, in June 2019; since then, the FINOS meeting attendance tracking is not able to crawl Confluence data, leading to the build to constantly fail.

Steps to Reproduce:

Checkout metadata-tool project
Setup the project to run local builds
Checkout metadata (private) repository in a sibling folder
Run lein run -- gen-meeting-roster-data -m ../metadata

Expected Result:

Generate a file finos-meetings.csv in the root folder containing FINOS meeting attendance crawled from https://finosfoundation.atlassian.net/wiki

Actual Result:

finos-meetings.csv is empty. The logic also fails with an HTTP 400 error when calling the Confluence user API, leading to NPE in the parse-string function.

Proposed fix

Use selenium to crawl public HTML, extract the full name of attendees and match it against the FINOS person Metadata fullName field.

Note that this logic shall apply only for the part of logic that parses a meeting page, not the logic to browse Confluence tree and identify meeting pages.

Current work

Added Selenium dependencies to https://github.com/finos/metadata-tool/tree/confluence-selenium-crawler and setting up development of the new feature.

maoo commented 5 years ago

Travis CI build will probably need to add support for Chrome headless run - https://docs.travis-ci.com/user/gui-and-headless-browsers/

maoo commented 5 years ago

Main code structure have been altered to allow Selenium to fetch the HTML body ; rest of the logic will be altered accordingly, some selectors will probably have to change.

See https://github.com/finos/metadata-tool/commit/5d83d42f72d9a86479749efb4d91027154befa74

pmonks commented 5 years ago

I would recommend using JSoup with an appropriate HTTP client (e.g. clj-http) rather than Selenium. This will be a vastly lighter weight solution, and therefore easier to maintain and enhance.

maoo commented 5 years ago

Hi @pmonks , thanks for the feedback; unfortunately clj-http doesn't support (AFAICS) Javascript execution; as I already had experience with Selenium (as I helped on https://github.com/7bridges-eu/shelob ), I used that one.

maoo commented 4 years ago

Solving issue, as this is what we delivered on https://github.com/finos/metadata-tool/pull/48

finos / metadata-tool