fffunction / backstop-crawl

🕷 Crawls a site to generate a backstopjs config file with scenarios pre-populated
38 stars 18 forks source link

Crawled reference URLs missing slash #25

Closed ataylorme closed 6 years ago

ataylorme commented 6 years ago

Issue

Generated referenceUrl keys are missing a slash after the main domain the rest of the URI.

Expected Behaviour

Valid URLs returned for referenceUrl in the generated Backstop configuration file.

Steps to reproduce

Run backstop-crawl 'https://scalewp.io/' --ignore-robots -o=backstop-crawl-test.json --reference-url='https://update-wp-wp-microsite.pantheonsite.io/' or backstop-crawl 'https://scalewp.io/' --ignore-robots -o=backstop-crawl-test.json --reference-url='https://update-wp-wp-microsite.pantheonsite.io' (without the trailing slash).

Example Output:

    {
      "label": "/object-caching/",
      "url": "https://scalewp.io/object-caching/",
      "referenceUrl": "https://update-wp-wp-microsite.pantheonsite.ioobject-caching/",
      "hideSelectors": [],
      "selectors": [
        "document"
      ],
      "readyEvent": null,
      "delay": 1500,
      "misMatchThreshold": 0.1
    }

Notes

Node version 8.8.1, backstop-crawl version 2.3.0, MacOS High Sierra 10.13.1

Possibly coming from index.js#L54 as removing this line causes correct output.

@danreeves I'm curious what you ran into in your testing that made you strip the trailing slash to begin with. There might be a difference with sites that enforce a trailing slash on the homepage and those that don't.

danreeves commented 6 years ago

Ah, damn. In my testing I was ending up with two slashes if the reference url had a trailing slash. I think you might be right about it being to do with the crawled site enforcing trailing slashes.

I think the answer is to not trim the url and then use https://www.npmjs.com/package/url-join or something to remove duplicate slashes.

Sorry!

ataylorme commented 6 years ago

@danreeves I think there is a simpler approach - check out #26

ataylorme commented 6 years ago

Closed by #26