REMLA24-Team-5 / Model-Training

The model-training repository contains all the code required to train and test a phishing URL detection machine learning model. It includes scripts for data preprocessing, model training, evaluation, and performance testing to ensure robust and accurate model development.
0 stars 0 forks source link

Setup dvc #14

Closed Timdnb closed 2 months ago

Timdnb commented 2 months ago

Also includes refactor of moving process_data.py into a features folder

note: a fix has been added to the model_definition.py file, to prevent Dense layer error, this has to be investigated

Timdnb commented 2 months ago

Fyi, I have only ran the pipeline for a single epoch so far

blibliboe commented 2 months ago

At this when pulling from dvc, I got the error message: Failed to pull data from the cloud. I get the message asking me if my cache is up to date a quick search lead me to this page https://dvc.org/doc/user-guide/troubleshooting#missing-files.

blibliboe commented 2 months ago

After a bit of fiddling around, I have to first run dvc repro before being able to pull

The specific error I get is: ERROR: failed to pull data from the cloud - Checkout failed for following targets: output\raw_x_test.joblib output\raw_y_test.joblib output\model.joblib output\y_train.joblib output\y_test.joblib output\raw_x_train.joblib output\x_train.joblib output\raw_y_val.joblib output\x_val.joblib output\raw_y_train.joblib output\y_val.joblib output\char_index.joblib output\raw_x_val.joblib output\x_test.joblib Is your cache up to date? https://error.dvc.org/missing-files

blibliboe commented 2 months ago

I still have a problem with the following files

ERROR: failed to pull data from the cloud - Checkout failed for following targets: output\y_test.joblib output\y_val.joblib output\model.joblib Is your cache up to date? https://error.dvc.org/missing-files