deepseek-ai / Janus

MIT License
378 stars 16 forks source link

this model can do audio-to-any too? #3

Open eletroswing opened 1 day ago

eletroswing commented 1 day ago

im looking to readme and only viewing messages about image generation and view. it can do something with the audio input?

charlesCXK commented 2 hours ago

Thank you for your attention to our work! The current Janus model validates the feasibility of using different encoding methods for various tasks (pure text understanding, multimodal understanding, visual generation), and then processing them with a single transformer. However, this version of Janus has not been trained on audio data.