Wordcab / wordcab-transcribe

💬 ASR FastAPI server using faster-whisper and Multi-Scale Auto-Tuning Spectral Clustering for diarization.
https://wordcab.github.io/wordcab-transcribe/
MIT License
197 stars 29 forks source link

Is there a plan to release a python package ? #120

Closed jissagn closed 1 year ago

jissagn commented 1 year ago

Hi,

First of all : tremendous work here ! Time performances are definitely amazing for both transcription and diarization, even on CPU with int8. It outperforms whisperx on my side, with quite the same quality. I'm currently trying to build an opensource stateful webapp for keeping track of transcriptions (modify them, share them, tune them, etc), that would make use of wordcab-transcribe (and optionnaly any other transcriptor, depending on user choice) for the CPU/GPU intensive ASR tasks.

So my question is : is there a plan to release a package that I could include to my project with pip install wordcab-transcribe for example ? Or is this going to be "just" an http api ?

Thanks :slightly_smiling_face:

chainyo commented 1 year ago

Hi @jissagn, thanks for your message. I'm glad you find this project helpful!

There is no plan to make a Python package on this, but we are working on some utils functions that could be easily added to any project to query the API.

By the way, IMHO, it's always great to separate the backend API from the frontend application, even if both are written in the same language, so you could develop your app in a separate Docker image and let them communicate together by linking them to the same docker network.

Let me know if I can help.

jissagn commented 1 year ago

Thanks @chainyo for your quick answer !

it's always great to separate the backend API from the frontend application, even if both are written in the same language

I agree. But what I'm building is not just a frontend, it's also a backend that handles the state (i.e. it adds a layer to the ASR engine to read/write results -and more- in a db).

you could develop your app in a separate Docker image and let them communicate together

That what is done currently :slightly_smiling_face: so what I have is the following :

① frontend (react/typescript) ⇔ ② statefull backend (fastapi/python) ⇔ ③ asr engine (celery for cpu/gpu-bound tasks)

The celery tasks processor ③ is in charge of running whisper, whisperx, you-name-it. This is where I want to use wordcab-transcribe too. If I cannot install it as a package, then I guess I should run a fourth service «④ wordcab-transcribe api» ? It feels a bit too convoluted to me overall.

What do you think ? Any input is greatly appreciated !

chainyo commented 1 year ago

With poetry, you can still make it installable in your project by cloning the repo, installing it, and using different services like TranscribeService in your app.

Otherwise, you can also build your ASR engine around the actual API, even if it makes things less easily upgradable in the future for you.

aleksandr-smechov commented 1 year ago

Thanks for the kind words @jissagn. If it's helpful, you can continue modifying this PR https://github.com/Wordcab/wordcab-transcribe/pull/94 - it's a bit closer to what you're talking about. We'd definitely welcome any help in making the API as accessible as possible ☺️

jissagn commented 1 year ago

With poetry, you can still make it installable in your project by cloning the repo, installing it, and using different services like TranscribeService in your app.

That is an interesting approach, and I think I will give it a shot.

Otherwise, you can also build your ASR engine around the actual API, even if it makes things less easily upgradable in the future for you.

I guess this is what https://github.com/Wordcab/wordcab-transcribe/pull/94 aims to do ?