Open lkcao opened 2 years ago
At what point should you choose to do a multi-modal model instead of using separate models and seeing which model performs the best, or combining the results? For example, if classification with text and image data, when is it worth doing a more computationally expensive model versus fitting two models and classifying where the text and image data agree?
In the homework I found that the multimodal (image and text) performance outperformed models with images but not models with text only. How can we identify the contribution of each dimension/data aspect to the predictability?
I am interested in hearing more about the multimodal knowledge for images and their extracted embeddings by using some other models.
I wonder can we combine individual models with different type of dependent variables, for example, a binary classification model with a multi-categorical classification?
I am also wondering about the value add of going to a multimodal model and identifying how various components contribute to its performance. When running our model we found that the multi-modal was much slower - how do we weigh this cost/benefit?
I'm really interested in the value of going for a multimodal approach and incorporating it into my project.
I wonder if there is a faint possibility of a 'universal' neural network structure design that could adaptively learn to code different types of data efficiently through a reinforcement learning process(with minimum human interference in structure design), then build multi-modal representation based on these learned representation from each modal?
I am interested in if we can combine different models within different sample sizes(For example, the data sample for one model is relatively sample than other models) together?
It's amazing that what an ensemble of multi model could bring. However, I am thinking are there any limitations of this approach? And another question is how can we align different data representations?
Most of today's models are single-task experts. Feeding models with multi-types of information from multiple pipelines has the great potential to further improve AI's single-task performance and multi-task capability. I want to know more about the current efforts on multi-model multi-task AI.
Post your question here about the orienting readings: “Integrative Learning - Multi-modal, Complex and Complete Models”, “Digital Doubles for Simulation, Prediction and Insight” & “Epilogue: You in the Loop”, Thinking with Deep Learning, Chapters 17, 18 & Epilogue.