TimRaedschDKFZ commented 1 year ago

Image Annotation in AI - How can we create HelmholtzNet: A large-scale benchmark for major real-world challenges?

Description

Simply speaking, a combination of annotated data sets and algorithms can generate impact in supervised Deep Learning (DL). While most researchers focus on algorithms, data-centric AI topics are often underrepresented.

With over 50k citations, ImageNet has been instrumental in advancing computer vision and deep learning research.

Researchers tend to work mostly on problems, where data sets have been published, leading to AI algorithms being developed mostly domain specific. In contrast, Helmholtz is actively working on the major challenges of our time (energy, climate, healthcare ...). While newer architectures claim that their solutions generalize, there is no benchmark that represents the broad diversity of tasks.

We want to build HelmholtzNet: A large-scale benchmark for major real-world challenges. Thus ensuring, that the most important questions of our time get answered.

Our workshop is set in three key phases:

Interactive presentation: We commence with an in-depth introduction to data-centric AI, focusing on the crucial role that annotated datasets serve in building and validating DL algorithms. We will illustrate how annotation errors falsify the model selection and how seemingly mundane parts of an annotation pipeline can lead to erroneous annotations in your dataset.
Presentation: Pitch for HelmholtzNet: A large-scale benchmark for major real-world challenges of our time.
Discussion: What is needed for HelmholtzNet? Post-presentation, we transition into a dynamic discussion involving domain scientists. This interactive dialogue is designed to
- Explore participants' perspectives on pressing challenges and their importance.
- Discuss limitations and challenges faced by existing AI solutions in addressing major challenges.
- Gather insights on desired AI solutions, datasets, and benchmarks for HelmholtzNet.
- Identify dimensions to ensure robustness for HelmholtzNet.
- Discuss strategies for continuous community feedback and improvement of HelmholtzNet.
- ...

Join us on this Helmholtz Imaging journey to tackle the major challenges of our time with generalizing solutions.

Organizational

Organizer(s)

Tim Rädsch tim.raedsch@dkfz-heidelberg.de

Speakers

Tim Rädsch tim.raedsch@dkfz-heidelberg.de

Format

Talks followed by open discussion

Timeframe

~1h to 1.5h

Number of participants

min 3: max: 99

HeidiSeibold commented 1 year ago

Such a cool idea @TimRaedschDKFZ 👏

What is the state of HelmholtzNet? Is it "just" an idea so far or have you started working on this already?

Btw. when I search for HelmholtzNet, I find this: https://www.helmholtznet.de/ (so naming might be an issue 😬)

TimRaedschDKFZ commented 1 year ago

Such a cool idea @TimRaedschDKFZ clap

What is the state of HelmholtzNet? Is it "just" an idea so far or have you started working on this already?

Thank you for the kind words! We are currently in an rather early stage and will use the Unconference Session as a starting point to identify the needs of the community and move from there.

Btw. when I search for HelmholtzNet, I find this: https://www.helmholtznet.de/ (so naming might be an issue grimacing)

Good catch. We were aware of that and got the naming covered ;).

JojoDevel commented 1 year ago

I think that's a really awesome project here :rocket: Based on my experience with microbial microscope images, annotated data often makes 50% of a successful DL application. Still, datasets are sparsely available in many domains and in the case they exist, their scale is often insufficient :see_no_evil:

This also limits the ability to effectively compare existing methods leading to a large amount of DL models with slight variations that are all validated on different and custom datasets. Thereof, advances in methods are very difficult to distinguish from different challenges in the data itself. More standardized and large-scale benchmarks from the diverse Helmholtz Imaging domains could fit in wonderfully here :heart_eyes:

Noting these challenges, we already started annotating & publishing benchmark data in the microbial imaging domain with over a million cell masks (large-scale dataset for segmentation and tracking). Despite this being just a first step, we could contribute to the session and explain the challenges we experienced during collecting, annotating and managing such benchmark data :muscle:

TimRaedschDKFZ commented 1 year ago

Thank you for the kind words Johannes! Couldn't agree more, data is where the magic happens.

Since this in an unconference session, we are open how we conduct the session. Would be happy to see you there and hearing/sharing your experiences.

Challenges in data annotation and the scope of HelmholtzNet are hot topics for this session. We will make sure to adhere to the demand of the participants.

Looking forward to seeing you in Hamburg, Tim

Von: Johannes Seiffarth @.***> Gesendet: Montag, 5. Juni 2023 21:50:55 An: DKRZ-AIM/HAI-HI-unconference-2023 Cc: Rädsch, Tim; Mention Betreff: Re: [DKRZ-AIM/HAI-HI-unconference-2023] Image Annotation in AI - How can we create HelmholtzNet: A large-scale benchmark for major real-world challenges? (Issue #7)

I think that's a really awesome project here 🚀 Based on my experience with microbial microscope images, annotated data often makes 50% of a successful DL application. Still, datasets are sparsely available in many domains and in the case they exist, their scale is often insufficient 🙈

This also limits the ability to effectively compare existing methods leading to a large amount of DL models with slight variations that are all validated on different and custom datasets. Thereof, advances in methods are very difficult to distinguish from different challenges in the data itself. More standardized and large-scale benchmarks from the diverse Helmholtz Imaging domains could fit in wonderfully here 😍

Noting these challenges, we already started annotating & publishing benchmark data in the microbial imaging domain with over a million cell masks (large-scale dataset for segmentation and trackinghttps://doi.org/10.5281/zenodo.7260136). Despite this being just a first step, we could contribute to the session and explain the challenges we experienced during collecting, annotating and managing such benchmark data 💪

— Reply to this email directly, view it on GitHubhttps://github.com/DKRZ-AIM/HAI-HI-unconference-2023/issues/7#issuecomment-1577380775, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A5HP3GUK4C6KOFXVCT22MYDXJY2B7ANCNFSM6AAAAAAYCVIVGQ. You are receiving this because you were mentioned.Message ID: @.***>

SusanneWenzel commented 1 year ago

@TimRaedschDKFZ I guess, you would you need a screen, right? We have a few available, but possibly not for each session. Flipchart will be available

TimRaedschDKFZ commented 1 year ago

Appreciate you reaching out. A flipchart should be sufficient.

Von: Susanne Wenzel @.***> Gesendet: Dienstag, 13. Juni 2023 11:07:14 An: DKRZ-AIM/HAI-HI-unconference-2023 Cc: Rädsch, Tim; Mention Betreff: Re: [DKRZ-AIM/HAI-HI-unconference-2023] Image Annotation in AI - How can we create HelmholtzNet: A large-scale benchmark for major real-world challenges? (Issue #7)

@TimRaedschDKFZhttps://github.com/TimRaedschDKFZ I guess, you would you need a screen, right? We have a few available, but possibly not for each session. Flipchart will be available

— Reply to this email directly, view it on GitHubhttps://github.com/DKRZ-AIM/HAI-HI-unconference-2023/issues/7#issuecomment-1588869608, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A5HP3GVBJAJDRZCN4B3FWTTXLAUUFANCNFSM6AAAAAAYCVIVGQ. You are receiving this because you were mentioned.Message ID: @.***>

SusanneWenzel commented 1 year ago

thank you, noted

TimRaedschDKFZ commented 1 year ago

On second thought, if we could get a screen, that would be awesome. (Only if it is not a too big hustle on your side). Then I could show some initial slides on data-centric AI and get every participant up to speed. This would enable the unconference session to run more streamlined.

Best, Tim

Von: Susanne Wenzel @.***> Gesendet: Dienstag, 13. Juni 2023 11:28:03 An: DKRZ-AIM/HAI-HI-unconference-2023 Cc: Rädsch, Tim; Mention Betreff: Re: [DKRZ-AIM/HAI-HI-unconference-2023] Image Annotation in AI - How can we create HelmholtzNet: A large-scale benchmark for major real-world challenges? (Issue #7)

thank you, noted

— Reply to this email directly, view it on GitHubhttps://github.com/DKRZ-AIM/HAI-HI-unconference-2023/issues/7#issuecomment-1588905013, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A5HP3GVPFPPIGNHWQX4NL5TXLAXCHANCNFSM6AAAAAAYCVIVGQ. You are receiving this because you were mentioned.Message ID: @.***>

SusanneWenzel commented 1 year ago

@TimRaedschDKFZ if possible, please make a note here on the (rough) number of participants. Also don't forget to make a note here about the outcome of the session and, if applicable, future plans that came out of this session.

DKRZ-AIM / HAI-HI-unconference-2023