This service will dockerize the collection and cleaning of new data for training.
This issue must be completed before model training is possible.
The code for scraping news articles with their labels, cleaning them and storing in mongodb already exists in ./docker/gather_data/
Create a second docker-compose file for data collection and model training.
Tasks
[x] Create a new docker-compose file under ./docker/ called docker-compose-train.yaml. It should contain the following:
so that it will integrate with the existing mongo configuration.
[x] Create a dockerfile in ./docker/gather_data and reference it in the new docker-compose.
This should start collecting data into mongo as its entrypoint.
Status
Assigning to @ (unassigned) Please use this branch https://github.com/N2ITN/are-you-fake-news/tree/develop-dockerize
Issue
This service will dockerize the collection and cleaning of new data for training. This issue must be completed before model training is possible. The code for scraping news articles with their labels, cleaning them and storing in mongodb already exists in
./docker/gather_data/
Create a second docker-compose file for data collection and model training.Tasks
[x] Create a new docker-compose file under
./docker/
calleddocker-compose-train.yaml
. It should contain the following:so that it will integrate with the existing mongo configuration.
[x] Create a dockerfile in
./docker/gather_data
and reference it in the new docker-compose. This should start collecting data into mongo as its entrypoint.[ ] Unit tests