4teamwork / ftw.linkchecker

0 stars 0 forks source link

Find all external links in objects #11

Closed busykoala closed 5 years ago

busykoala commented 5 years ago

To feed the external link checker script we need to find all external links in our plone page.

We find these links in multiline textfields as well as in textfields. Maybe there are more possibilities than that, but no-one yet came up with other ideas.

After that it's necessary to find out if the strings contain link like structure, either by a regular expression pattern or with a library like lxml-xpath.

In the end we need to collect the links in a dictionary telling the destination url as well as the resulting url found in that field like:

urls = [
    {'origin': '<url of origin>', 'destination': '<url of destination>'},
]