-
```
What steps will reproduce the problem?
1. Create a web-page with a malformed URL (or a protocol like mailto:)
2. Run the crawler on said website.
3. Crash and burn at line 89 in WebURL.java - this…
-
I installed master/45995736 on OS X 10.8.5 today. I'm using ruby 1.9.3-p429 via rbenv, and when I try to run the crawler from the "rubycode" directory, I get a gem specification failure:
```
$ ruby m…
irons updated
10 years ago
-
This seems to happen randomly.
Today I searched for _alzheimer_ and got results for _ehlers-danlos syndrome_. Last week I made a chemical structure search and got results for _cadasil_ (look at the lo…
-
## Summary
We should make Scrapy downlaoder middleware pause when Internet connection is lost, and wait until it is back to resume the downloader middleware.
## Motivation
Currently, on a con…
-
What happens when Google tries to crawl a site that uses a B3-based theme? What if a user's browser doesn't support JavaScript?
The theme should be able to detect these situations and serve the user…
-
Details on accessing web content behind paywalls...
http://www.ghacks.net/2016/02/26/read-articles-behind-paywalls-by-masquerading-as-googlebot/
It references two addons, RefControl and User Agent S…
-
How do I need to configure to jump to the local station?
At present, my link is like this
![my](https://user-images.githubusercontent.com/23733037/179547780-624b0d92-4224-44a7-8ca1-c04b34c8f6c9.…
-
Hi,
iam trying to set up anew norconex connector for a page.
Here iam having trouble with reading URLs in the page like those in teh index bar
![image](https://user-images.githubusercontent.com/29…
-
Hi, I downloaded https://flibusta.is using your Docker examples from the README, around 90 GB. And I see that some links of the same type are not fetched - they have absolute URLs, open Firefox on cli…
-
**Version Used**:
8f02e04893
**Steps to Reproduce**:
1. Have a custom workspace of type "Foo"
2. Try to enable diagnostic tagger for the documents in the workspace
**Expected Behavior**:
…