Like the gptj_eval, it seems building the graph every time during invoking this function. Is it possible to store the computational graph we built, and just input tensors into the graph to perform computations each time? In other words, does building the computational graph for each prediction consume more time, especially when I need to perform batch computations? Thanks in advance.
Like the
gptj_eval
, it seems building the graph every time during invoking this function. Is it possible to store the computational graph we built, and just input tensors into the graph to perform computations each time? In other words, does building the computational graph for each prediction consume more time, especially when I need to perform batch computations? Thanks in advance.