OpenTermsArchive / engine

Tracks contractual documents and exposes changes to the terms of online services.
https://opentermsarchive.org
European Union Public License 1.2
111 stars 30 forks source link

Snapshot always fail when `executeClientScripts` and `startBefore` #906

Closed martinratinaud closed 2 years ago

martinratinaud commented 2 years ago

When creating a snapshot for a document using puppeteer, we wait for the needed selectors to be present on the page before recording the snapshot.

When using an object instead of a string as a selector like

 {
          "startBefore": "[role=\"separator\"]",
          "endAfter": "body"
        }

The snapshot will not be retrieved and will fail with an error as code is currently not supporting this possibility

Ndpnt commented 2 years ago

It's the way supposed to be. You cannot pass Range selectors to fetch function, you have to extract CSS selectors from Range selectors before passing an array of cssSelectors to fetch function.

martinratinaud commented 2 years ago

You're right, we thus need to expose DocumentDeclaration as the transformation from an array of string and range selectors to an array of strings useable by the fetcher is a bit complex and we do not want to duplicate this code over several projects.

Again here, I feel the need to harmonize the DocumentDeclaration type and the JSON

martinratinaud commented 2 years ago

This is thus replaced by