-
The goals of the code challenge is to best assess the technical skills of a developer. The expectations on using libraries/gem is expected. However, the expectation is also to have a fair amount of cu…
-
It would be great to have the Enterprise Search web crawler out-of-box crawl dynamic pages. I mean the ones that are autogenerated from JavaScript executors and dynamic pages and then have those pages…
-
Does the Open Graph support imply that social media networks such as Facebook will scrape the correct information if the properties are set?
-
An emailing list Jim belongs to receives emails when a user signs up for a new feature.
Jim to send out example email.
Look at using regular expressions to match the feature the user uses, emails supp…
-
I'm trying to og-tags with Gatsby and Helmet. But the problem is that those tags first need to be fetched.
It's a vehicle detail page and I need to show vehicle make and model in those tags, but th…
-
The multi-step process, using both Parsehub and Scrapy, has some downsides:
- It adds a bunch more steps
- Makes the crawler less flexible
- Means rework needs to be done if there are any other p…
-
Crawler fails to extract CVEs from MendIO since it's only looking at bulletins (https://www.mend.io/vulnerability-database/full-listing/2011/3) and NOT CVE pages.
May need to just increase the dept…
-
Any interest in publishing this as a CLI tool on NPM?
I have have a Travis CI build running on a static site that checks for various errors, and I'd like to be able to run the crawler during the buil…
-
Right now, the `file://` protocol is severely limited on chrome, meaning it doesn't provide any support for JavaScript to perform requests like via XHR or the `fetch` API. There are other scenarios as…
est31 updated
6 years ago
-
The spider cannot crawl pages which use javascript heavily.
Eg., `amazon.jobs`, `jobs.google.com`, etc.
Scrapy cannot handle sites like these so we'll have to use something like Selenium or Splas…