balikasg / tf-exporter

Exports a sentence-transformer model as a single tensorflow graph
Apache License 2.0
3 stars 0 forks source link

After converting to a single tf graph, the prediction time becomes longer. #4

Open birdmu opened 1 month ago

birdmu commented 1 month ago

hello, After converting the model ( https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v2 ) using tf_exporter, the converted model is placed into tensorflow serving. However, there is an issue: using the original model, completing one predict request from input to output takes around 10ms. But with the converted model, completing one inference request from input to output takes around 100ms, even when accessing TensorFlow Serving locally and ignoring network latency. Is this 100ms latency normal, and what modifications should be made to reduce the latency to be similar to the original model?

thanks a lot.

balikasg commented 1 month ago

Hello,

I am not actively working on this project for the moment being. I will try to reproduce this at a later moment, not sure when though.

I would suggest to add debugging statements within the steps (tokenisation, forward pass, normalization, …) to see where the time is spent and optimise from there.. I hope this helps! I would be happy to review any fixes!


From: birdmu @.> Sent: Monday, May 13, 2024 2:52:02 PM To: balikasg/tf-exporter @.> Cc: Subscribed @.***> Subject: [balikasg/tf-exporter] After converting to a single tf graph, the prediction time becomes longer. (Issue #4)

hello, After converting the model ( https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v2 ) using tf_exporter, the converted model is placed into tensorflow serving. However, there is an issue: using the original model, completing one predict request from input to output takes around 10ms. But with the converted model, completing one inference request from input to output takes around 100ms, even when accessing TensorFlow Serving locally and ignoring network latency. Is this 100ms latency normal, and what modifications should be made to reduce the latency to be similar to the original model?

thanks a lot.

— Reply to this email directly, view it on GitHubhttps://github.com/balikasg/tf-exporter/issues/4, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB5NJHOJDE4PG6TOABCO4HLZCCZPFAVCNFSM6AAAAABHUE34WSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI4TENZTGU4DMNA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

birdmu commented 1 month ago

Hello, I am not actively working on this project for the moment being. I will try to reproduce this at a later moment, not sure when though. I would suggest to add debugging statements within the steps (tokenisation, forward pass, normalization, …) to see where the time is spent and optimise from there.. I hope this helps! I would be happy to review any fixes! ____ From: birdmu @.> Sent: Monday, May 13, 2024 2:52:02 PM To: balikasg/tf-exporter @.> Cc: Subscribed @.> Subject: [balikasg/tf-exporter] After converting to a single tf graph, the prediction time becomes longer. (Issue #4) hello, After converting the model ( https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v2 ) using tf_exporter, the converted model is placed into tensorflow serving. However, there is an issue: using the original model, completing one predict request from input to output takes around 10ms. But with the converted model, completing one inference request from input to output takes around 100ms, even when accessing TensorFlow Serving locally and ignoring network latency. Is this 100ms latency normal, and what modifications should be made to reduce the latency to be similar to the original model? thanks a lot. — Reply to this email directly, view it on GitHub<#4>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB5NJHOJDE4PG6TOABCO4HLZCCZPFAVCNFSM6AAAAABHUE34WSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI4TENZTGU4DMNA. You are receiving this because you are subscribed to this thread.Message ID: @.>

thanks for replying, I am a beginner when it comes to transformer and tensorflow serving detail, so currently I can hardly use your advice on distributed debugging. ANYWAY, thanks a lot.