[NBI] Live-stock detection (DeepForest)

camappel commented 2 months ago

What is the notebook about?

This notebook will explore the capabilities of the DeepForest package. In particular, it will demonstrate how to:

Detect live-stock in airborne imagery using the prebuilt live-stock detection model
Evaluate the prebuilt model's performance before training
Fine-tune the model using a novel publicly-available dataset
Evaluate the fine-tuned model's performance after training

The prebuilt live-stock model was trained on a limited dataset. According to the package's documentation, "the prebuilt models will always be improved by adding data from the target area". As such, this notebook will explore the improvement in the model's performance in live-stock detection from fine-tuning on local data.

Data Science Component

[x] Exploration
[x] Preprocessing
[ ] Modelling
[x] Post-processing
[ ] Other (e.g. Reproducibility):

Submission type

[x] Standard
[ ] Special Issue
[ ] Other (e.g. CI2023 Reproducibility Challenge):

Programming language

[x] Python
[ ] R
[ ] Julia
[ ] Other:

Checklist:

[x] Input data, pipeline and/or model are public with license/citation
[x] The proposed notebook reuses existing codebase
[x] The proposed notebook uses open-source packages
[x] The proposed notebook is associated to existing publication(s)

Additional information

An EDS notebook already exists on tree crown detection using DeepForest. This notebook is different because it focusses on the latest version of DeepForest (1.4.0), which includes a new prebuilt live-stock model, and also demonstrates how to fine-tune the model.

acocac commented 2 months ago

@camappel, thanks for opening the notebook idea - it sounds great to test novel pre-build models from the DeepForest package.

The notebook incorporates suggestions from the closed issue #251, so I'm happy to support the submission of the notebook.

Please move to the preparation stage and contact here if you experience any issues.

camappel commented 2 months ago

Great, thanks! I'm going to use the same PR as the last one since the environment will be similar

acocac commented 2 months ago

@camappel - LGTM. Please feel free to explore the pooch library (see docs) for fetching the suggested labelled dataset.

camappel commented 2 months ago

Hi @acocac, I have a couple of questions about the capacity of the binder environment, and how to structure the notebook accordingly.

Currently in the notebook, I fetch and process a large dataset from dataverse, partition by train/validate/test, then fine-tune the livestock detection model on the train/validate sets. I then evaluate the baseline and fine-tuned models on the test set.

My questions are:

Can I download the entire dataset in the notebook? I believe you previously said to just download a couple of sample images for visualisation, but the evaluation step requires a whole dataset to get the relevant metrics. Maybe instead, I could just download the test set (10%) in the notebook for the evaluation section?
Can I train the model in the notebook? One previous notebook included the training step, but another just downloaded the weights. The problem with this, however, is that it does not demonstrate the training process, and I would like to show how to configure the model and create trainer (only a few lines of code).

So I think the notebook structure could be:

Context
Setup environment
Download test data, baseline model, and fine-tuned model (include model configuration and training code here, but commented out?)
Evaluation (baseline vs fine-tuned)
Visualisation

Please let me know your thoughts! Thanks

acocac commented 2 months ago

Hi @camappel - thanks for sharing updates!

Can I download the entire dataset in the notebook? I believe you previously said to just download a couple of sample images for visualisation, but the evaluation step requires a whole dataset to get the relevant metrics. Maybe instead, I could just download the test set (10%) in the notebook for the evaluation section?

The main notebook should only download the test set. You could archive it in Zenodo, and refer to the original dataset (and respective license) within the metadata/description of the zenodo repository (see for instance the subset dataset used in the COSMOS-UK notebook).

Can I train the model in the notebook? One previous notebook included the training step, but another just downloaded the weights. The problem with this, however, is that it does not demonstrate the training process, and I would like to show how to configure the model and create trainer (only a few lines of code).

I suggest adding a markdown cell where you highlight the training process (see an example here).

So I think the notebook structure could be:

The structure looks good to me. Thanks for your effort in validating/sharing progress of your submission.

acocac commented 3 weeks ago

@camappel we have started the PRE-REVIEW phase. Fingers crossed for a constructive feedback of your submission!

alan-turing-institute / environmental-ds-book