Multi Modal AI: Opening the Doors To General Artificial Intelligence
What your talk is about:
My talk will be a deep dive into the introduction to Multi-Modal AI.
By MultiModal AI I mean going past the basic LLMs, and moving towards systems that fuse multiple modalities of data together to create a shared representation. Chat GPT and transformer models have brought to a certain point in the AI Revolution, but independantly they can only move us so far.
True learning in human beings and living things in general doesn't occur simply through text alone, but rather through the incorporation of several different forms of external input into a cross-correlated stream of data. This is the true path forward of how AI can and will bound past the current limitation of text based models.
Example:
A baby doesn't learn to walk by simply reading about it. They experience the world and prior primoridal training ingrained deep into its biology to actuate its sensors (eyes, nose, ears), to learn a complex representation of the world. Machines must do the same achieve a similar level of undertstanding.
Apple through Vision Pro and Multimedia Image Editing
Google through Gemini
Open AI through GPT-4 Vision
and many more players entering the AI Space are
Components of my talk:
An introduction
A basic primer on ML
An understanding of what multimodal models are
A dive into cross modal embeddings and numbers behind the multimodal AI space
Real world Examples of multi-modal AI applications
Demos of my own work in the Multi-Modal space
Notes:
This talk will be quite involved, but not inaccessible.
I will present complex topics in a way that everyone can understand. I am not only an engineer in the space but I'm also a researcher who has a body of academic work devoted to this topic
I will show simple live demos that will appeal to the audience and some potential coding demos
I slated my time for 60 minutes but I'm ok reducing to a 45 minute talk if need be
How long will your talk be?
[ ] 20-30 minutes
[ ] 30-45 minutes
[ x] 60 minutes or more
Meetup event Copy - Optional
This will show up on your Meetup.com Event page for your talk. Example event page.
Event Title:
Multi Modal AI: Opening the Doors To General Artificial Intelligence
Event Description:
Shafik's talk will be a deep dive into the introduction to Multi-Modal AI. By MultiModal AI, we mean going past the basic LLMs, and moving towards systems that fuse multiple modalities of data together to create a shared representation. Chat GPT and transformer models have brought to a certain point in the AI Revolution, but independantly they can only move us so far.
Speaker Bio:
Shafik Quoraishee is a Senior Android/ML Engineer at the New York Times working on the NYT Games team on the integration of games such as Crosswords, Wordle, Connections, and Suduko. He also works on ML research and algorithms particularly in the Visual AI Space. He has several academic ML based publications in prominent Engineering Journals such as IEEE and SPIE from his time working as an infield Signals Processing research, and he teaches Data Science as well.
About You
Your Name:
Shafik Quoraishee
Twitter or Linkedin handle (optional):
https://www.linkedin.com/in/shafik-quoraishee/
The best way to reach out to you:
Your Talk
Title:
Multi Modal AI: Opening the Doors To General Artificial Intelligence
What your talk is about:
My talk will be a deep dive into the introduction to Multi-Modal AI.
By MultiModal AI I mean going past the basic LLMs, and moving towards systems that fuse multiple modalities of data together to create a shared representation. Chat GPT and transformer models have brought to a certain point in the AI Revolution, but independantly they can only move us so far.
True learning in human beings and living things in general doesn't occur simply through text alone, but rather through the incorporation of several different forms of external input into a cross-correlated stream of data. This is the true path forward of how AI can and will bound past the current limitation of text based models.
Example:
A baby doesn't learn to walk by simply reading about it. They experience the world and prior primoridal training ingrained deep into its biology to actuate its sensors (eyes, nose, ears), to learn a complex representation of the world. Machines must do the same achieve a similar level of undertstanding.
Apple through Vision Pro and Multimedia Image Editing Google through Gemini Open AI through GPT-4 Vision
and many more players entering the AI Space are
Components of my talk:
Notes:
How long will your talk be?
Meetup event Copy - Optional
This will show up on your Meetup.com Event page for your talk. Example event page.
Event Title:
Multi Modal AI: Opening the Doors To General Artificial Intelligence
Event Description:
Shafik's talk will be a deep dive into the introduction to Multi-Modal AI. By MultiModal AI, we mean going past the basic LLMs, and moving towards systems that fuse multiple modalities of data together to create a shared representation. Chat GPT and transformer models have brought to a certain point in the AI Revolution, but independantly they can only move us so far.
Speaker Bio: Shafik Quoraishee is a Senior Android/ML Engineer at the New York Times working on the NYT Games team on the integration of games such as Crosswords, Wordle, Connections, and Suduko. He also works on ML research and algorithms particularly in the Visual AI Space. He has several academic ML based publications in prominent Engineering Journals such as IEEE and SPIE from his time working as an infield Signals Processing research, and he teaches Data Science as well.