@AMarinic92 discussed method with PhD student contact from Waterloo who provided more resources and wisdom on their experience with ML and passed on some literature to read:
"you'll probably have to introduce a new input/output layer that matches your data" - Zach
"for choosing a network you want something that fits your data problem and appropriately deep vs how large your dataset is, so if your dataset is sparse maybe you don't want a very deep network. Alternatively what you COULD do is just take half of a pre trained network. basically there are options" - Zach
@RozenNoureev proposed: https://www.tensorflow.org/text/tutorials/image_captioning#try_it_on_your_own_images Group tends to agree on this approach.
@AMarinic92 discussed method with PhD student contact from Waterloo who provided more resources and wisdom on their experience with ML and passed on some literature to read: