Closed ziyouchutuwenwu closed 3 years ago
But why after all you need splash for that page? E.g. I can easily get all the data with:
response = Crawly.fetch("https://www.erlang-solutions.com/blog/")
%HTTPoison.Response{
body: "<!DOCTYPE html>\n<html lang=\"en-US\" class=\"no-js\">\n\n<head>\n <meta charset=\"UTF-8\">\n <meta content=\"width=device-width, initial-scale=1.0, maximum-scale=1\" name=\"viewport\">\n <link rel=\"apple-touch-icon\" sizes=\"180x180\" href=\"https://www.erlang-solutions.com/wp-content/themes/nopio_master_theme/assets/images/favicon/apple-touch-icon.png\">\n <link rel=\"icon\" type=\"image/png\" sizes=\"32x32\" href=\"https://www.erlang-solutions.com/wp-content/themes/nopio_master_theme/assets/images/favicon/favicon-32x32.png\">\n <link rel=\"icon\" type=\"image/png\" sizes=\"16x16\" href=\"https://www.erlang-solutions.com/wp-content/themes/nopio_master_theme/assets/images/favicon/favicon-16x16.png\">\n <meta name=\"msapplication-TileColor\" content=\"#ffffff\">\n <meta name=\"theme-color\" content=\"#ffffff\">\n\t\n \t<!-- Google Optimize --> \t\n\t<script src=\"https://www.googleoptimize.com/optimize.js?id=OPT-5TG4NK6\"></script>\n\t<!-- Google Optimize --> \t\n\t\n <!-- Google Tag Manager -->\n <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':\n new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],\n j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=\n 'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);\n })(window,document,'script','dataLayer','GTM-KTN9QLQ');</script>\n <!-- End Google Tag Manager -->\n <meta name='robots' content='index, follow, max-image-preview:large, max-snippet:-1, max-video-preview:-1' />\n\n\t<!-- This site is optimized with the Yoast SEO plugin v16.2 - https://yoast.com/wordpress/plugins/seo/ -->\n\t<link media=\"all\" href=\"https://www.erlang-solutions.com/wp-content/cache/autoptimize/css/autoptimize_56f2446d64420c1896236903a8d9bd90.css\" rel=\"stylesheet\" /><title>Erlang, Elixir & RabbitMQ resources - Erlang Solutions Blog</title>\n\t<meta name=\"description\" content=\"As supporters of open source tech and Erlang, Elixir, OTP on the BEAM, we share our knowledge and insights with the Community so that we can all grow.\" />\n\t<link rel=\"canonical\" href=\"https://www.erlang-solutions.com/blog/\" />\n\t<link rel=\"next\" href=\"https://www.erlang-solutions.com/blog/page/2/\" />\n\t<meta property=\"og:locale\" content=\"en_US\" />\n\t<meta property=\"og:type\" content=\"article\" />\n\t<meta property=\"og:title\" content=\"Erlang, Elixir & RabbitMQ resources - Erlang Solutions Blog\" />\n\t<meta property=\"og:description\" content=\"As supporters of open source tech and Erlang, Elixir, OTP on the BEAM, we share our knowledge and insights with the Community so that we can all grow.\" />\n\t<meta property=\"og:url\" content=\"https://www.erlang-solutions.com/blog/\" />\n\t<meta property=\"og:site_name\" content=\"Erlang Solutions\" />\n\t<meta property=\"og:image\" content=\"https://www.erlang-solutions.com/wp-content/uploads/2021/02/Transparent-Erlang-Solutions-Logo-Black.png\" />\n\t<meta property=\"og:image:width\" content=\"8001\" />\n\t<meta property=\"og:image:height\" content=\"4500\" />\n\t<meta name=\"twitter:card\" content=\"summary_large_image\" />\n\t<meta name=\"twitter:site\" content=\"@ErlangSolutions\" />\n\t<script type=\"application/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https://schema.org\",\"@graph\":[{\"@type\":\"Organization\",\"@id\":\"https://www.erlang-solutions.com/#organization\",\"name\":\"Erlang Solutions\",\"url\":\"https://www.erlang-solutions.com/\",\"sameAs\":[\"https://www.facebook.com/ErlangSolutions/\",\"https://www.linkedin.com/company/erlangsolutions\",\"https://www.youtube.com/c/ErlangSolutions\",\"https://twitter.com/ErlangSolutions\"],\"logo\":{\"@type\":\"ImageObject\",\"@id\":\"https://www.erlang-solutions.com/#logo\",\"inLanguage\":\"en-US\",\"url\":\"https://www.erlang-solutions.com/wp-content/uploads/2021/02/Transparent-Erlang-Solutions-Logo-Black.png\",\"contentUrl\":\"https://www.erlang-solutions.com/wp-content/uploads/2021/02/Transparent-Erlang-Solutions-Logo-Black.png\",\"width\":8001,\"height\":4500,\"caption\":\"Erlang Solutions\"},\"image\":{\"@id\":\"https://www.erlang-solutions.com/#logo\"}},{\"@type\":\"WebSite\",\"@id\":\"https://www.erlang-solutions.com/#website\",\"url\":\"https://www.erlang-solutions.com/\",\"name\":\"Erlang Solutions\",\"description\":\"Buildin" <> ...,
{:ok, document} = Floki.parse_document(response.body)
{:ok,
[
{"html", [{"lang", "en-US"}, {"class", "no-js"}],
[
{"head", [],
[
{"meta", [{"charset", "UTF-8"}], []},
{"meta",
[
{"content",
"width=
ex(8)> hrefs = document |> Floki.find("a.btn-link") |> Floki.attribute("href")
["https://www.erlang-solutions.com/blog/the-future-for-erlang-solutions/",
"https://www.erlang-solutions.com/blog/fintech-matters-newsletter-1-july-2021/",
"https://www.erlang-solutions.com/blog/blockchain-fintech-and-the-beam/",
"https://www.erlang-solutions.com/blog/5-erlang-and-elixir-use-cases-in-fintech/",
"https://www.erlang-solutions.com/blog/lessons-fintech-can-learn-from-telecom-part-two/",
"https://www.erlang-solutions.com/blog/how-to-use-rabbitmq-in-service-integration/",
"https://www.erlang-solutions.com/blog/lessons-fintech-can-learn-from-telecom-part-one/",
"https://www.erlang-solutions.com/blog/erlang-solutions-partners-with-cockroach-labs/",
"https://www.erlang-solutions.com/blog/how-to-ensure-your-instant-messaging-solution-offers-users-privacy-and-security/",
"https://www.erlang-solutions.com/blog/fintech-client-case-studies-erlang-solutions-and-trifork/"]
thanks for your quick help, this is just for example. this url does not need splash, indeed
i just try splash, but find it not working..........
Sorry I was not using splash for some time already, so it might be a bit hard for me to advise here.
thanks, so for the js part, is there any new solution? or just keep it currently?
I am looking towards chrome headless, as it seemed to be way more reliable. E.g. I suggest looking on: https://oltarasenko.medium.com/building-a-chrome-based-fetcher-for-crawly-a779e9a8d9d0?sk=2dbb4d39cdf319f01d0fa7c05f9dc9ec
I did not have time to add it to Crawly yet, as I had to work on another commercial product, and don't have a chance to contribute :(
thank you very much!
hi, i made a sample as demo, code like this
config/config.exs
i start splash with cmd
the iex shell output is
if i remove splash fetcher from config , i can get the response, but a bit slowly.
i run
docker exec -it xxx bash
into splash container. then executecurl --head https://www.erlang-solutions.com/blog
, after about 15 seconds, i got the response.modify config.exs
i got
great thanks.