Closed harshsikka closed 1 year ago
In Progress Doc, feel free to leave comments: https://docs.google.com/document/d/1XHuLaVpfMUZCwTKMiCo-rtK15ylKTdWqz9UGZOiaIkQ/edit?usp=sharing
Added notes & some issues on doc for tokenization & embedding scheme
Broken out into issues ( ManifoldRG/NEKO_Archive#23 & ManifoldRG/NEKO#42 ), closing.
https://github.com/ManifoldRG/NEKO/issues/2#issuecomment-1574605632 intros the motivation for this document
In exploring the data, we have discovered/originated numerous issues and opportunities, but our overall picture of the project is still fragmented. Spending some time designing our approach, while introducing overhead, will provide a venue for us to pool internal decisions/feedback, and also solicit feedback from the community & other AI researchers. It will also increase our speed later on in this project.
Output: preliminary planning doc, outlining the high level architecture and modules, training process and infrastructure, as well as emerging issues related to all of the above for buy in from the team.
This document will serve as a strawman to surface any design issues. We can then build a more technical design document for us to coordinate granular decisions on.