This is a great library but it does take a long time to run. I was wondering if once we have tested the inferencing of the pipeline, we can disable the outputs of the attention layers for the prediction? I think this can help speed up inferencing but a huge amount.
I do not need to test the outputs for every single example passed into the pipeline so I think the extraction and saving of these outputs can be optimized?
Hi.
This is a great library but it does take a long time to run. I was wondering if once we have tested the inferencing of the pipeline, we can disable the outputs of the attention layers for the prediction? I think this can help speed up inferencing but a huge amount.
I do not need to test the outputs for every single example passed into the pipeline so I think the extraction and saving of these outputs can be optimized?