Closed ScottMansfield closed 9 years ago
The href attribute of the a tag should be recorded as an outgoing link. The href attribute of the a tag should NOT be sent back to the fetch stage.
The src attribute of the img tag should be recorded as an img link.
I can probably filter the outLinks collection by the imgLinks collection to exclude items.
This was actually incorrect, as the links and contained images were different sizes. I solved this by doing a HEAD request on all links pulled out and excluding image/*
The href attribute of the a tag should be recorded as an outgoing link. The href attribute of the a tag should NOT be sent back to the fetch stage.
The src attribute of the img tag should be recorded as an img link.
I can probably filter the outLinks collection by the imgLinks collection to exclude items.