Tests & Refactor (incl. dependencies, CICD workflow, Documentation workflow) & Doc.

IGNF / myria3d

Myria3D: Aerial Lidar HD Semantic Segmentation with Deep Learning

https://ignf.github.io/myria3d/

BSD 3-Clause "New" or "Revised" License

170 stars 23 forks source link

Tests & Refactor (incl. dependencies, CICD workflow, Documentation workflow) & Doc. #14

Closed CharlesGaydon closed 2 years ago

CharlesGaydon commented 2 years ago

A test suite that covers typical use case: training and prediction from CLI, successive train+test, dry run on RandLaNet, overfitting test with RandLaNet and PointNet to assure that the model is trainable.

Dependency torch-points-kernels is deleted, and replaced using pyg, which adds some complexity in code but simplifies installation of virtual environment. The resulting code is retrocompatible with previous models and fully tested for regressions (IoU is unchanged on a 15km² test set).

Corrections to the docker file are also implemented ; in particular, CUDA images were broken by a CUDA update, and needed to be adjusted.

Workflows make a good use of caching functionnalitis, both from Docker and from Github environment.

Requirements files are simplified. Dependencies are installed without redundant command lines. torchmetrics version is fixed, because pytorch-lightning would elsewise use a newer, non-retrocompatible version.

CharlesGaydon commented 2 years ago

TODO: empêcher crash du script d'environnement lié aux messages de pip.

CharlesGaydon commented 2 years ago

TODO: https://github.com/IGNF/lidar-deep-segmentation/blob/72fe9e6e3d4f51cf49da5974bb51496399ca9664/lidar_multiclass/data/transforms.py#L283 -> commentaire sur la raison d'être de la conversion en dictionnaire python.

CharlesGaydon commented 2 years ago

Message d'erreur plus explicite quand une erreur de classe survient dans https://github.com/IGNF/lidar-deep-segmentation/blob/72fe9e6e3d4f51cf49da5974bb51496399ca9664/lidar_multiclass/data/transforms.py#L278

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

CharlesGaydon commented 2 years ago

En production pour utiliser cette nouvelle version (ping @gliegard, on pourra en discuter)

partir d'une image à jour lidar_deep_im:main, produite et archivée automatiquement lors d'un push sur main.
utiliser predict.ckpt_path à la place de predict.resume_from_checkpoint. pour l'inférence -> c'est à répercuter dans le fichier de configuration, ou bien en CLI directement.
Ajouter l'option --ipc=host lors de l'appel docker run en inférence pour permettre la parallélisation. cf doc Pytorch :

Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run.