AReinke commented 1 year ago

Title

Image analysis validation: How can we guarantee that our algorithms perform as intended?

Description

The importance of automatic image analysis based on artificial intelligence (AI) is growing rapidly. However, only a small number of algorithms have been successfully applied in real-world settings. The fact that validation is frequently undervalued could be one of the causes. To enable the accurate tracking of scientific progress, however, and to close the current gap between method research and method translation into practice, reliable algorithm validation is essential.

We will discuss shortcomings in current AI validation, identify aspects that should be improved and brainstorm on strategies how to improve common practice.

Organizational

Organizer(s)

Annika Reinke: a.reinke@dkfz.de

Speakers

Annika Reinke: a.reinke@dkfz.de (short introductory talk)

Format

Introductory talk followed by open discussions on a) shortcomings, b) identification of most pressing issues and c) strategies how to improve shortcomings. Depending on the number of participants, the discussions can be organized in small groups or world cafés.

Timeframe

~1h-1.5h

Number of participants

3-99

SusanneWenzel commented 1 year ago

@AReinke Would you need a screen? We have a few available, but possibly not for each session. Flipchart will be available

AReinke commented 1 year ago

We don't need a screen, but a flipchart would be awesome!

SusanneWenzel commented 1 year ago

thank you, noted

Old-Shatterhand commented 1 year ago

That sound's very interesting. When and where are you planning to have this Session?

SusanneWenzel commented 1 year ago

The schedule will be set during the lunch break

SusanneWenzel commented 1 year ago

@AReinke if possible, please make a note here on the (rough) number of participants. Also don't forget to make a note here about the outcome of the session and, if applicable, future plans that came out of this session.

AReinke commented 1 year ago

We are outside at the old salon

AReinke commented 1 year ago

Expected cost metric: Different weights for error terms Normalized expected cost: compare results to random/naive classifier

https://arxiv.org/abs/2209.05355

AReinke commented 1 year ago

Data splits should reflect the real life situation. Random splits may yield corrupted results

AReinke commented 1 year ago

If you come up with a problem and want raise awareness, try to have a group behind you supporting your points -> this will increase outreach and trust

Also it's good to have people from other domains. Often they use similar metrics/strategies you could learn from

AReinke commented 1 year ago

How to form a consortium? How to know about people working on these things?

-> watch out Twitter, social Media, conference Workshops or tutorials. Or just directly contact people

AReinke commented 1 year ago

Watch out other domains: you can learn from them! E.g. check out Computer Vision, Speech Recognition, NLP, Explainable AI (e.g. relavance Maps), other applications...

AReinke commented 1 year ago

It's a good idea to ask domain experts (e.g. radiologists) to rate the outcome of an algorithm and see if there is a correlation or which metric best reflects the radiologist's opinion

AReinke commented 1 year ago

~10 participants

devesh1611singh commented 1 year ago

Hey everyone. Devesh here, I had a great time today. Thanks everyone one for a lively discussion.

Following is the link to my recent conference paper where I show that beyond number metrics (acc, auc etc), explainable DL models with visual feature importance based methods (saliency maps/relevance maps) could be more helpful, in understanding model's actual performance. Link - https://rdcu.be/ddSbo

Feel free to reach out if you would to discuss anything or collaborate.

Bye!

TobiasWeigel commented 1 year ago

Indeed thanks for the discussion. I'd like to keep following this topic as I see it as a possible roadblock for widespread adoption of ML in Earth & Environmental modelling. Particularly the extension of tje metrics catalog to regression problems would be really great.

DKRZ-AIM / HAI-HI-unconference-2023

Image analysis validation: How can we guarantee that our algorithms perform as intended? #8

Title

Description

Organizational

Organizer(s)

Speakers

Format

Timeframe

Number of participants