adobe / helix-importer-ui

Apache License 2.0
20 stars 25 forks source link

[Importer] - Support regular expressions in filter pathname #381

Open bhellema opened 1 week ago

bhellema commented 1 week ago

The customer hubblehomes.com as a very well known pattern for their inventory homes, where all paths end in 999. It would be very useful to be able to crawl the site and get all these inventory pages by providing a regular expression in the filter pathname field as part of the Crawl section.

Expected Behaviour

As a user I would like to have the Filter pathname support a regular expression option. image

Actual Behaviour

It seems that only startsWith is supported on the path which is very limiting.

Reproduce Scenario (including but not limited to)

Steps to Reproduce

  1. Navigated to the Crawl page
  2. Enter https://www.hubblehomes.com as the host
  3. Uncheck Show Preview
  4. Enter the regular expression .*\/999 into the Filter pathname field

(No results)

Sample URL

https://www.hubblehomes.com/new-homes/idaho/boise-metro/caldwell/mason-creek/birch/19103-piaggio-ave/999

atopper commented 1 week ago

@kptdobe @catalan-adobe Small change for an issue just reported. I can't assign anyone (perhaps I need to do it before I create the PR). Could you have a look? Thanks.