Open ivo-dukov opened 9 years ago
Can you share the exact URL that is causing the problem? Under the hood, Wombat is using Mechanize to request the page, so it could be either a Mechanize bug or a misconfiguration
So here is the full response:
/Users/IvoDukov/.rvm/gems/ruby-2.1.5/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:308:in `fetch': 400 => Net::HTTPBadRequest for *the_url* -- unhandled response (Mechanize::ResponseCodeError)
from /Users/IvoDukov/.rvm/gems/ruby-2.1.5/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:976:in `response_redirect'
from /Users/IvoDukov/.rvm/gems/ruby-2.1.5/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:300:in `fetch'
from /Users/IvoDukov/.rvm/gems/ruby-2.1.5/gems/mechanize-2.7.3/lib/mechanize.rb:440:in `get'
from /Users/IvoDukov/.rvm/gems/ruby-2.1.5/gems/wombat-2.3.0/lib/wombat/processing/parser.rb:47:in `parser_for'
from /Users/IvoDukov/.rvm/gems/ruby-2.1.5/gems/wombat-2.3.0/lib/wombat/processing/parser.rb:33:in `parse'
from /Users/IvoDukov/.rvm/gems/ruby-2.1.5/gems/wombat-2.3.0/lib/wombat/crawler.rb:30:in `crawl'
from websites/net-a-porter/link_crawler.rb:78:in `<main>'
And here is my code:
class LinksCrawler
include Wombat::Crawler
base_url website_base_url
path category_path
links({:xpath => '//div[@class="description"]/a[contains(@href, "product")]/@href'}, :list)
end
link_crawler = LinksCrawler.new
link_crawler.crawl
I don't want to share the exact url because of security purposes, but I can tell you that if you paste it in the browser it works for sure.
Hello, I noticed some strange behaviour of Wombat. Let's say I want to crawl 2 websites firstly I was using Typhoeus and Regex to crawl websites, but there was one website which constantly was giving me 302 and then i found Wombat but now the interesting thing is that when I use wombat for it it works perfectly, but when I try wombat on the other website i get an error which is
And the URL is correct ... I tried it in the browser and it worked. So can somebody help me with this one.. Also I don't have puts in front of Wombat.crawl do ... because I saw this also as a problem. Thank you in advance and sorry for my english!