Significant latency regression in latest release

philkuz commented 1 year ago

Hi @chenxwh and replicate,

The latest available version seems to have a significant latency regression from the version that I have been using for some time now. Trying the same large-v1 model (New) and large model (Old) on what I believe is a warm model seems to have drastically different performance characteristics.

From the replicate runs page, New is 10x slower than old on equivalent data

New	ID	Model	Source	Status	Run Time	Created
New	imiwp7wkk…	openai/whisper	API	Succeeded	57.6 seconds	a minute ago
New	hhj3ijrde…	openai/whisper	API	Succeeded	44.4 seconds	2 minutes ago
Old	fdhdfyvmf…	openai/whisper	API	Succeeded	3.0 seconds	6 minutes ago

In my metrics you can see a latency shift over also in the ~10x range New:

Old:

New version sha: 23241e5731b44fcb5de68da8ebddae1ad97c5094d24f94ccb11f7c1d33d661e2 Old version sha: b6e7ea7aef18444c29d974fee51ffc1e47e1699cfaf4e5cde0ba47a8db74f3b6

Looking deeper, i decided to "bisect" versions with the following test

Warm up the model with one request
When the warm up request returns, send another request and use that as a measure of performance
Mark as bad if transcription time is >30s, otherwise mark good Bad: 23241e5731b44fcb5de68da8ebddae1ad97c5094d24f94ccb11f7c1d33d661e2 Good:089ea17a12d0b9fc2f81d620cc6e686de7a156007830789bf186392728ac25e8 Good: 30414ee7c4fffc37e260fcab7842b5be470b9b840f2b608f5baa9bbef9a259ed

So really looks like the latest change added a regression. Going to revert my versioning away from the latest, but thought I would let the team know.

philkuz commented 1 year ago

I scanned the code and don't see anything obvious. Could this be something in some changed in a new cog release or something?

maccman commented 1 year ago

I've noticed a significant slow-down too.

zeke commented 1 year ago

Hey y'all. Thanks for reporting this issue and sharing your analysis. I've added this to our internal board to discuss when the team gets back from the holiday break next week.

maccman commented 1 year ago

@zeke this get prioritized? :)

zeke commented 1 year ago

Not sure. Let me check with the team!

zeke commented 1 year ago

Sounds like @andreasjansson was planning to look into this. I'll defer to him.

zeke commented 1 year ago

Also @daanelson and @evilstreak :)

daanelson commented 1 year ago

hey! just a quick heads up for those interested that we're working on this. Think we have a fix to get large-v2 out without a regression, needs some testing to confirm. will keep you posted.

maccman commented 1 year ago

hey! just a quick heads up for those interested that we're working on this. Think we have a fix to get large-v2 out without a regression, needs some testing to confirm. will keep you posted.

Sweet!!

maccman commented 1 year ago

@daanelson this out yet?

daanelson commented 1 year ago

@maccman not yet, unfortunately. Should have time to dig in some more next week.

maccman commented 1 year ago

@daanelson how about now? :)

daanelson commented 1 year ago

@maccman I've set up a whisper version which hosts only large-v2 here: https://replicate.com/daanelson/whisper-sandbox

Feel free to give it a spin and let me know how it goes; haven't seen any latency spikes in testing so far.

chenxwh / cog-whisper

Significant latency regression in latest release #11