Open eletroswing opened 1 day ago
Thank you for your attention to our work! The current Janus model validates the feasibility of using different encoding methods for various tasks (pure text understanding, multimodal understanding, visual generation), and then processing them with a single transformer. However, this version of Janus has not been trained on audio data.
im looking to readme and only viewing messages about image generation and view. it can do something with the audio input?