Veerse / go_colly_and_kafka_POC

An example of the Go Colly scrapper library in action
1 stars 0 forks source link

[Questions] dockerization / sitemap #1

Open ghost opened 4 years ago

ghost commented 4 years ago

Salut/Hello,

Hope you are all well !

I was looking for examples using gocolly/kafka and I found your repository.

So I pushed it in order to learn from your work and I have some questions if you do not mind.

For my first question, probably because you have not find it, is why aren't you using their sitemap xml to get all the pages available on their website. ref. http://www.autoreflex.com/sitemap/index.xml

My second question, is about the dockerization of this poc as i would like to see it working as a stack with a simple docker-compose up, do you think it is possible to make it work ?

I tried with the following docker-compose but it does not work.

---
version: '3.7'
services:

  mongodb:
    image: mongo:latest
    volumes:
    - mongo-data:/var/lib/mongodb/db
    - mongo-backup:/var/lib/backup
    ports:
    - 27017:27017
    networks:
    - internal
    - web
    command: mongod --replSet mongodb0

  zoo1:
    image: zookeeper:3.4.9
    hostname: zoo1
    ports:
      - "2181:2181"
    environment:
        ZOO_MY_ID: 1
        ZOO_PORT: 2181
        ZOO_SERVERS: server.1=zoo1:2888:3888
    volumes:
      - ./zk-single-kafka-single/zoo1/data:/data
      - ./zk-single-kafka-single/zoo1/datalog:/datalog

  kafka1:
    image: confluentinc/cp-kafka:5.5.0
    hostname: kafka1
    ports:
      - "9092:9092"
    environment:
      KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka1:19092,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
      KAFKA_ZOOKEEPER_CONNECT: "zoo1:2181"
      KAFKA_BROKER_ID: 1
      KAFKA_LOG4J_LOGGERS: "kafka.controller=INFO,kafka.producer.async.DefaultEventHandler=INFO,state.change.logger=INFO"
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
    volumes:
      - ./zk-single-kafka-single/kafka1/data:/var/lib/kafka/data
    depends_on:
      - zoo1

networks:
  internal:
  web:
    external: true

volumes:
  mongo-data:
  mongo-backup:

Can you help me to complete it ?

Thanks in advance for any insights or inputs on these questions.

Ps. je parle francais aussi.

Cheers, X

ghost commented 4 years ago

Hi,

Hope you are all well and you will reply :-)

I have started a dockerization of your repository https://github.com/x0rzkov/gocolly-kafka-docker but the consumer does not receive messages.

I check that the kafka server url is correct on both sides and executed:

docker-compose exec kafka sh -c './bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic announces'

but it replies:

Error while executing topic command : Topic announces already exists
[2020-05-22 06:24:19,323] ERROR java.lang.IllegalArgumentException: Topic announces already exists
    at kafka.admin.TopicCommand$AdminClientTopicService.createTopic(TopicCommand.scala:247)
    at kafka.admin.TopicCommand$TopicService.createTopic(TopicCommand.scala:196)
    at kafka.admin.TopicCommand$TopicService.createTopic$(TopicCommand.scala:191)
    at kafka.admin.TopicCommand$AdminClientTopicService.createTopic(TopicCommand.scala:219)
    at kafka.admin.TopicCommand$.main(TopicCommand.scala:62)
    at kafka.admin.TopicCommand.main(TopicCommand.scala)
 (kafka.admin.TopicCommand$)

Can you help me to sort it out ? Maybe I missed something int the kafka config.

Cheers, X