HanwenXuTHU / BioTranslatorProject

MIT License
33 stars 7 forks source link

Text generation #5

Closed ys-zong closed 1 year ago

ys-zong commented 1 year ago

Hi Hanwen,

Thanks for sharing the code! I was wondering what is the process/model for the text generation, like Fig 2 (g). Maybe I miss something but I couldn't find the descriptions in the paper or code. And further to clarify, does the generation mean the auto-regressive generation (generating something doesn't exist before) or retrieval from the existing training data? Many thanks!

HanwenXuTHU commented 1 year ago

Hi Yongshuo, thank you for reaching out! Yes, the generation part is a decoder based on the autoregressive loss. You can build one model directly trained from the shared text embedding space to the textual descriptions. In addition, to generate sequences given one gene set, we adopted the Textomics model as our decoder here. You can follow this codebase to reproduce the results in our paper. We follow the procedure to first retrieve a set of GO descriptions and output the novel descriptions. Since the quick emerging of more powerful LLMs, we're upgrading the decoder part and that's why we didn't release this part. We're expecting to release a more convenient and powerful software for text generation in the next few months and will definitely let you know once we release it.