daita-technologies / ai-tools

AI-based tools for the DAITA platform.
http://app.daita.tech
GNU Affero General Public License v3.0
1 stars 0 forks source link

Test-time augmentation #3

Open pcaversaccio opened 2 years ago

pcaversaccio commented 2 years ago

Based on a Slack discussion between me and @ttattl I now understand better the reasoning behind applying data augmentation to the validation & test dataset. We should definitely consider some sort of augmentation techniques for the test dataset as well in the future.

One way could be using Test-Time Augmentation: Similar to what data augmentation is doing to the training set, the purpose of Test-Time Augmentation is to perform random modifications to the test images. Thus, instead of showing the regular, "clean" images, only once to the trained model, we will show it the augmented images several times. We will then average the predictions of each corresponding image and take that as our final guess. An example PyTorch implementation can be found here: https://github.com/qubvel/ttach

AI-Daita commented 2 years ago

A very good survey for augmentation here that we should take attention after MVP. There is also a section for test-time augmentation: https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0197-0

About test-time augmentation, it is now an obligatory step for medical image processing (to publish a paper :)). My recent nature scientific report had this techique (see the screen-shot), here RTS is random transformation sampling.

Screenshot 2021-12-10 at 15 45 01

Sorry @pcaversaccio , I only discuss it with @ttattl, not with you about it to avoid over-thinking for MVP, because finally test-augmentation is not really different with train-augmentation.

There is also a trade-off for real-time application (because of the increase computational time for prediction).

pcaversaccio commented 2 years ago

Thx for sharing this survey @giaoNguyen70. Ah, that's really interesting (I'm not so much into the academic world anymore) - when I've talked to practitioners usually they tell me they apply data augmentation only on the training dataset. I need to better understand why this is the case (my understanding so far is that the prediction speed is really important for e.g. self-driving cars and therefore such an augmentation technique is not feasible). Nonetheless, as pointed out by @ttattl in our Slack discussion, one of the USPs for this case could be that DAITA provides top-notch augmented test cases that our competitors do not provide. However, very importantly, there must be people/businesses who are demanding for such a solution. What I mean by this is we need to check whether there is a product/market fit.