Efficient Model - Githubissues

When training our model to "see" screenshots, it's essential that it can see the screenshots at full resolution instead of massively downscaled screenshots, as the information gets lost much too quickly. For example, text written on a screen quickly becomes unreadable after just one 2x reduction. Therefore it's critical that our model can efficiently process these large states.\ The difficulty in this problem comes from the fact that a "full screenshot" is compromised of ~9 million pixels (in the case of a 4k monitor), each having three features. In total, that'd be 27 million features per frame, which would let us fit 19 frames at a theoretical maximum of 500 million features per sample. As seeing only the past 19 frames is not feasible, we need to improve the memory efficiency of our model by feeding it something other than pure frames.

HomebrewNLP / AGI-at-Home

Efficient Model #2