elixir-crawly / crawly

Crawly, a high-level web crawling & scraping framework for Elixir.
https://hexdocs.pm/crawly
Apache License 2.0
965 stars 113 forks source link

Crawly.fetch giving 301 response instead of 200 #214

Closed shellking4 closed 1 year ago

shellking4 commented 2 years ago

I'm having an issue using crawly. it was working but now it's giving 301 response

Why Crawly.fetch is returning 301 response instead of 200 ?

oltarasenko commented 1 year ago

I can't reproduce this issue:


iex(1)> Crawly.fetch("https://books.toscrape.com/")
%HTTPoison.Response{
  status_code: 200,
  body: "<!DOCTYPE html>\n<!--[if lt IE 7]>      <html lang=\"en-us\" class=\"no-js lt-ie9 lt-ie8 lt-ie7\"> <![endif]-->\n<!--[if IE 7]>         <html lang=\"en-us\" class=\"no-js lt-ie9 lt-ie8\"> <![endif]-->\n<!--[if IE 8]>         <html lang=\"en-us\" class=\"no-js lt-ie9\"> <![endif]-->\n<!--[if gt IE 8]><!--> <html lang=\"en-us\" class=\"no-js\"> <!--<![endif]-->\n    <head>\n        <title>\n    All products | Books to Scrape - Sandbox\n</title>\n\n        <meta http-equiv=\"content-type\" content=\"text/html; charset=UTF-8\" />\n        <meta name=\"created\" content=\"24th Jun 2016 09:29\" />\n        <meta name=\"description\" content=\"\" />\n        <meta name=\"viewport\" content=\"width=device-width\" />\n        <meta name=\"robots\" content=\"NOARCHIVE,NOCACHE\" />\n\n        <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->\n        <!--[if lt IE 9]>\n        <script src=\"//html5shim.googlecode.com/svn/trunk/html5.js\"></script>\n        <![endif]-->\n\n        \n            <link rel=\"shortcut icon\" href=\"static/oscar/favicon.ico\" />\n        \n\n        \n        \n    \n    \n        <link rel=\"stylesheet\" type=\"text/css\" href=\"static/oscar/css/styles.css\" />\n    \n    <link rel=\"stylesheet\" href=\"static/oscar/js/bootstrap-datetimepicker/bootstrap-datetimepicker.css\" />\n    <link rel=\"stylesheet\" type=\"text/css\" href=\"static/oscar/css/datetimepicker.css\" />\n\n\n        \n        \n\n        \n\n        \n            \n            \n\n        \n    </head>\n\n    <body id=\"default\" class=\"default\">\n        \n        \n    \n    \n    <header class=\"header container-fluid\">\n        <div class=\"page_inner\">\n            <div class=\"row\">\n                <div class=\"col-sm-8 h1\"><a href=\"index.html\">Books to Scrape</a><small> We love being scraped!</small>\n</div>\n\n                \n            </div>\n        </div>\n    </header>\n\n    \n    \n<div class=\"container-fluid page\">\n    <div class=\"page_inner\">\n        \n    <ul class=\"breadcrumb\">\n        <li>\n            <a href=\"index.html\">Home</a>\n        </li>\n        <li class=\"active\">All products</li>\n    </ul>\n\n        <div class=\"row\">\n\n            <aside class=\"sidebar col-sm-4 col-md-3\">\n                \n                <div id=\"promotions_left\">\n                    \n                </div>\n                \n    \n    \n        \n        <div class=\"side_categories\">\n            <ul class=\"nav nav-list\">\n                \n                    <li>\n                        <a href=\"catalogue/category/books_1/index.html\">\n                            \n                                Books\n
            \n                        </a>\n\n                        <ul>\n                        \n                \n                    <li>\n                        <a href=\"catalogue/category/books/travel_2/index.html\">\n                            \n                                Travel\n                            \n                        </a>\n\n                        </li>\n                        \n                \n                    <li>\n                        <a href=\"catalogue/category/books/mystery_3/index.html\">\n                            \n                                Mystery\n
                  \n                        </a>\n\n                        </li>\n                        \n                \n                    <li>\n                        <a href=\"catalogue/category/books/historical-fiction_4/index.html\">\n                            \n                                Historical Fiction\n                            \n                        </a>\n\n                        </li>\n                        \n                \n                    <li>\n                        <a href=\"catalogue/category/books/sequential-art_5/index.html\">\n                            \n
                   Sequential Art\n                            \n                        </a>\n\n                        </li>\n                        \n                \n                    <li>\n
             <a href=\"catalogue/category/books/classics_6/index.html\">\n                            \n                                Classics\n                          " <> ...,
  headers: [
    {"Date", "Wed, 14 Sep 2022 11:26:32 GMT"},
    {"Content-Type", "text/html"},
    {"Content-Length", "51294"},
    {"Connection", "keep-alive"},
    {"Last-Modified", "Thu, 26 May 2022 21:15:15 GMT"},
    {"ETag", "\"628fede3-c85e\""},
    {"Accept-Ranges", "bytes"},
    {"Strict-Transport-Security", "max-age=0; includeSubDomains; preload"}
  ],
  request_url: "https://books.toscrape.com/",
  request: %HTTPoison.Request{
    method: :get,
    url: "https://books.toscrape.com/",
    headers: [{"User-Agent", "Crawly Bot"}],
    body: "",
    params: %{},
    options: []
  }
}
iex(2)>```