This program goes through the pages of a www.autoreflex.com announce listing and extracts for each announce :
Listing used for this POC : http://www.autoreflex.com/137.0.-1.-1.-1.0.999999.1900.999999.-1.99.0.1?fulltext=&geoban=M137R99
A) Create an Announces Kafka topic :
./kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic announces
B) Use following Docker commands to run a MongoDB container :
docker volume create announces-volume
docker run --name announces-mongodb -v announces-volume:/data/db -p 27017:27017 -d mongo
(for the sake of the POC we use no auth. We could have set environment variables _MONGO_INITDB_ROOTUSERNAME and _MONGO_INITDB_ROOTPASSWORD to set credentials)C) Download Go Colly and MongoDB libraries :
go get -u github.com/gocolly/colly/...
go get -u go.mongodb.org/mongo-driver/mongo
D) Run consumer and producer
Producer :
Consumer (log message hidden by default) :
MongoDB announces collection :