REMLA24-Team-5 / Model-Training

The model-training repository contains all the code required to train and test a phishing URL detection machine learning model. It includes scripts for data preprocessing, model training, evaluation, and performance testing to ensure robust and accurate model development.
0 stars 0 forks source link

Download dataset as part of pipeline #19

Closed SagaRut closed 2 months ago

SagaRut commented 2 months ago

The dataset is stored outside the project and can be automatically downloaded as part of the pipeline. The dataset is stored remotely and the feature encoding is a shared component.

blibliboe commented 2 months ago

If you run dvc repro --pull you automatically pull all the missing files so in my opinion the dataset is already part of the pipeline