elixir-crawly / crawly

Crawly, a high-level web crawling & scraping framework for Elixir.
https://hexdocs.pm/crawly
Apache License 2.0
953 stars 112 forks source link

management Web UI on localhost:4001 is not working #278

Closed cmnstmntmn closed 3 months ago

cmnstmntmn commented 8 months ago

Hey, great work first of all!

My spider works fine but the interface is not running. I tried to wire to Bandit Am i missing something?

The API is working

preview

Also /new is working

new

But the management interface (index/list page), (this one), is not working.

Management Interface

Relevant code:

# lib/application.ex

defmodule MyApp.Application do
  # See https://hexdocs.pm/elixir/Application.html
  # for more information on OTP Applications
  @moduledoc false

  use Application

  @impl true
  def start(_type, _args) do
    children = [
      {Bandit, plug: Crawly.API.Router}
    ]

    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: Netskope.Supervisor]
    Supervisor.start_link(children, opts)
  end
end
# config/config.ex

import Config

config :crawly,
  closespider_timeout: 10,
  concurrent_requests_per_domain: 8,
  closespider_itemcount: 100,
  log_dir: "./tmp/spider_logs",
  log_to_file: true,
  start_http_api: true,

  middlewares: [
    Crawly.Middlewares.DomainFilter,
    Crawly.Middlewares.UniqueRequest,
    {Crawly.Middlewares.UserAgent, user_agents: ["Crawly Bot", "Google"]}
  ],
  pipelines: [
    # An item is expected to have all fields defined in the fields list
    {Crawly.Pipelines.Validate, fields: [:url]},

    # Use the following field as an item uniq identifier (pipeline) drops
    # items with the same urls
    {Crawly.Pipelines.DuplicatesFilter, item_id: :url},
    Crawly.Pipelines.JSONEncoder,
    {Crawly.Pipelines.WriteToFile, folder: "./tmp", extension: "jl"}
  ]

ty

cmnstmntmn commented 8 months ago

I think i found the issue, i created a PR for it: https://github.com/elixir-crawly/crawly/pull/279

However, even if the interface is loading, now

Screenshot 2023-11-20 at 01 12 49

The list of spiders is empty ..

cmnstmntmn commented 8 months ago

Fixed via this PR

JonasGruenwald commented 4 months ago

I was also a bit confused because in the docs it is stated that the management interface is on by default

https://hexdocs.pm/crawly/0.16.0/configuration.html#start_http_api-boolean

start_http_api? :: boolean() default: true

https://hexdocs.pm/crawly/0.16.0/readme.html#simple-management-ui-new-in-0-15-0-management-ui

NOTE: It's possible to disable the Simple management UI (and rest API) with the start_http_api?: false options of Crawly configuration.

But in reality it looks to me like it's off by default:

https://github.com/elixir-crawly/crawly/blob/f863b5b278b9c5e78f10c3a3d576b772a7397455/lib/crawly/application.ex#L36C1-L36C66

oltarasenko commented 3 months ago

I think I have addressed the last issue, and now the api is enabled by default