diffbot / docs

Diffbot Documentation Suite
5 stars 18 forks source link

First draft of crawl efficiency explainer. #61

Closed diffbot-dan-urman closed 4 years ago

diffbot-dan-urman commented 4 years ago

To encourage users to create efficient and performant crawls, we want to have a page suggesting how to ensure that most crawled pages are processed. We want to be able to link to this page in a few places, so far including:

diffbot-dan-urman commented 4 years ago

I've taken a first stab at text here. I think the key points we want to hit are:

  1. Crawl efficiency is good because it speeds up the crawl.
  2. Crawl efficiency is accomplished by not crawling a bunch of pages you don't process, which is accomplised by filtering via crawling patterns/regexes instead of just processing patterns/regexes.
  3. An example of the above.

We may be able to come up with more examples or even some general guidelines, but it's difficult as sites can differ so widely. I'm not sure more than one illustrative example actually provides value here.

diffbot-dan-urman commented 4 years ago

Pages that we may want to modify to link to this one:

diffbot-dan-urman commented 4 years ago

@Swader @rick-diffbot @miketung Any thoughts on this draft? I'd certainly accept any corrections or polish to the text, but I'm mostly looking for feedback on whether this approach and structure makes sense and accomplishes our goal.

If we're happy with this approach I'll make whatever tweaks are appropriate to the related pages to reference this one, update the sidebar appropriately, and then I think we can push it out.

Swader commented 4 years ago

I think this is a great start and we can add examples onto this later, but it drives the point home nicely.

diffbot-dan-urman commented 4 years ago

Cool - thanks, Swader. I'm a little swamped today but I'll plan to make those other tweaks and push this out soon.

diffbot-dan-urman commented 4 years ago

Rebased and updated related pages to link to new efficiency guide.

The sidebars seem to be in a weird state; I couldn't figure out where to link this page in. @Swader - can you advise on that?

Swader commented 4 years ago

What's weird state? You can put it at the end of this array maybe: https://github.com/diffbot/docs/blob/master/website/sidebars.json#L155

diffbot-dan-urman commented 4 years ago

We chatted about the sidebar; @Swader will follow up with another issue there. Meanwhile I've added this explainer in the recommended location.