Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.97k stars 1.86k forks source link

How can I use the python-sdk inside docker? #174

Closed stoney95 closed 5 years ago

stoney95 commented 5 years ago

Hi,

I'm working with the python azure-cognitiveservices-speech-package.

I'm building a service that is using the speech_recognizer to transcript speech - it's basically the same as in the examples. Running the service works fine, but I want to use it inside a docker container. Usually I should get some RECOGNIZED-events and after the whole text is processed I receive a CLOSING-event. When I'm running inside docker I don't receive any RECOGNIZED-events.

This looks like the sdk can't reach the azure backend, maybe there are some ports that need to be forwarded. But actually I have no clue which ports should be forwarded. So, does the SDK use REST-calls or is it establishing some kind of socket-connection? And if, which ports is it using? Or is there some kind of best-practice to use the sdk inside docker?

wolfma61 commented 5 years ago

we use websockets, currently don't make any REST calls the only port we need / access is the standard https port 443

stoney95 commented 5 years ago

I'm forwarding that port already. But it's still not working. Is there a possibiltiy to run the SpeechRecognizer or start_continuous_recognition() in verbose or debug mode? Or is there any logging I could access? Because I only get no results and no error-msg.

I also tried running the service outside a docker-container without any internet-connection and then it behaves differently from running inside docker, so I think that there must be some kind of connection when running inside docker.

EDIT: As port-forwarding is not the problem, i've been digging a little bit deeper. Docker image I'm using is ubuntu:18.04. My Dockerfile looks like the following:

FROM ubuntu:18.04

ADD . /code
WORKDIR /code

RUN apt-get update
RUN apt-get install -y build-essential libasound2 wget libssl1.0.0
RUN apt-get install -y python3.6 python3-pip
RUN pip3 install -r requirements.txt
CMD ["python3", "service.py"]

So setup is done like it's described here: https://docs.microsoft.com/de-de/azure/cognitive-services/speech-service/quickstart-python

Via tcpdump I found that the sdk is sending its requests to: westeurope.stt.speech.microsoft.com When I do nslookup westeurope.stt.speech.microsoft.comon my system I get the following response:

Server: ...
Address:    ...

Non-authoritative answer:
westeurope.stt.speech.microsoft.com canonical name = crisfrontendweu.trafficmanager.net.
crisfrontendweu.trafficmanager.net  canonical name = fe-prod4-weu.cris.ai.
Name:   fe-prod4-weu.cris.ai
Address: 40.119.156.135

Doing this inside the docker container leads to:

Server: ...
Address:    ...

Non-authoritative answer:
westeurope.stt.speech.microsoft.com canonical name = crisfrontendweu.trafficmanager.net.
crisfrontendweu.trafficmanager.net  canonical name = fe-prod4-weu.cris.ai.
Name:   fe-prod4-weu.cris.ai
Address: 40.119.156.135
** server can't find fe-prod4-weu.cris.ai: NXDOMAIN

wget https://westeurope.stt.speech.microsoft.com (inside docker and outside):

--2019-03-07 12:26:12--  https://westeurope.stt.speech.microsoft.com/
Resolving westeurope.stt.speech.microsoft.com (westeurope.stt.speech.microsoft.com)... 40.119.156.135
Connecting to westeurope.stt.speech.microsoft.com (westeurope.stt.speech.microsoft.com)|40.119.156.135|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2019-03-07 12:26:12 ERROR 404: Not Found.

ping -c 2 westeurope.stt.speech.microsoft.com (inside docker):

PING fe-prod4-weu.cris.ai (40.119.156.135) 56(84) bytes of data.

--- fe-prod4-weu.cris.ai ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1052ms

ping -c 2 westeurope.stt.speech.microsoft.com (outside docker):

PING fe-prod4-weu.cris.ai (40.119.156.135): 56 data bytes
Request timeout for icmp_seq 0

--- fe-prod4-weu.cris.ai ping statistics ---
2 packets transmitted, 0 packets received, 100.0% packet loss

Actually I'm not sure if this problem is related to the sdk or my docker-setup.

mahilleb-msft commented 5 years ago

Is it a DNS thing in the container? For example, Docker may add 8.8.8.8 as DNS, but you could be operating in a networking environment where this is blocked. https://docs.docker.com/v17.09/engine/userguide/networking/default_network/configure-dns/

stoney95 commented 5 years ago

I'm not very familiar with networks but I don't think so.

When you try to resolve e.g. google.com inside the container it's working:

root@linuxkit-025000000001:/# nslookup google.com
Server:     192.168.65.1
Address:    192.168.65.1#53

Non-authoritative answer:
Name:   google.com
Address: 172.217.16.142
Name:   google.com
Address: 2a00:1450:4001:808::200e

And also when resolving westeurope.stt.speech.microsoft.com you can figure out an IP-Address but then you run into some kind of mistake (see other comment)

EDIT: An other sign that it might not be a DNS thing, is that I tried using recognize_once()instead of start_continuous_recognition() and this works find inside docker.

Whats the difference between these two methods? Maybe there could be a clue whats not working.

mahilleb-msft commented 5 years ago

There's no difference between the two in terms of connection. recognize_once() is however meant only for single-shot reco, up to 15 s. So .. potentially you have silence longer than 15 seconds at the beginning.

Could you clarify whether recognize_once() works as expected on the host? I.e., is there any difference between host and Docker setup?

For cases that don't work as expected, can you provide more details, e.g., cancellation event defailts, or return values, as well as speech region and Session IDs? You can pick up session IDs from the session_started event (hopefully), similar to here:

https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/88dcad0c96e765da7fe546bcbf0975974d39e1bf/samples/python/console/speech_sample.py#L205

mahilleb-msft commented 5 years ago

(If you still suspect networking difference within/outside the Docker container, maybe you could also go through these steps to validate your subscription in both environments: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/troubleshooting#validate-your-subscription-key)

stoney95 commented 5 years ago

recognize_once() runs identically on the host and inside docker. My host system is MacOS and docker is ubuntu:18.04.

The region I'm working on is westeurope. I will provide logs from the service using start_continuous_recognition().

Outside docker:

INFO - Start recognition
INFO - SESSION STARTED: SessionEventArgs(session_id=BF5C52D0B9154CB993E3024432A9E549)
INFO - Read 626624 bytes
INFO - Write to azure
INFO - Read 0 bytes
INFO - Closed streams
INFO - RECOGNIZED: SpeechRecognitionEventArgs(session_id=BF5C52D0B9154CB993E3024432A9E549, result=SpeechRecognitionResult(result_id=B06FE79DF53140F0A31FF580BAB83773, text="SOME TEXT", reason=ResultReason.RecognizedSpeech))
INFO - RECOGNIZED: SpeechRecognitionEventArgs(session_id=BF5C52D0B9154CB993E3024432A9E549, result=SpeechRecognitionResult(result_id=F2A7414BFBC245A0884D9568447FF755, text="SOME TEXT", reason=ResultReason.RecognizedSpeech))
INFO - RECOGNIZED: SpeechRecognitionEventArgs(session_id=BF5C52D0B9154CB993E3024432A9E549, result=SpeechRecognitionResult(result_id=36690B7E7F224125B3B980AC61D5AA10, text="SOME TEXT", reason=ResultReason.RecognizedSpeech))
INFO - RECOGNIZED: SpeechRecognitionEventArgs(session_id=BF5C52D0B9154CB993E3024432A9E549, result=SpeechRecognitionResult(result_id=47BD7E6A71ED4948937AB76B4AA42611, text="SOME TEXT", reason=ResultReason.RecognizedSpeech))
INFO - CLOSING on SpeechRecognitionCanceledEventArgs(session_id=BF5C52D0B9154CB993E3024432A9E549, result=SpeechRecognitionResult(result_id=4B3B2F5BFCB3441E8A48AFF7F85761FA, text="", reason=ResultReason.Canceled))
INFO - STOPPED: SessionEventArgs(session_id=BF5C52D0B9154CB993E3024432A9E549)

There are also RECOGNIZING-Events but I removed them to make the logs clearer.

Inside docker:

INFO - Start recognition
INFO - SESSION STARTED: SessionEventArgs(session_id=0f6acfcb80e54ff2ab4704ef71f3be8e)
INFO - CLOSING on SpeechRecognitionCanceledEventArgs(session_id=0f6acfcb80e54ff2ab4704ef71f3be8e, result=SpeechRecognitionResult(result_id=d82b81a920ce467a86dce11dd292a12e, text="", reason=ResultReason.Canceled))
INFO - Read 628672 bytes
INFO - Write to azure
INFO - Read 0 bytes
INFO - Closed streams

So the SpeechRecognizer fires a CANCELED-Event even before I start to write something to the Input-Stream. So I started debugging inside docker. And speech_recognizer.start_continuous_recognition() results in:

INFO - SESSION STARTED: SessionEventArgs(session_id=0f6acfcb80e54ff2ab4704ef71f3be8e)
INFO - CLOSING on SpeechRecognitionCanceledEventArgs(session_id=0f6acfcb80e54ff2ab4704ef71f3be8e, result=SpeechRecognitionResult(result_id=d82b81a920ce467a86dce11dd292a12e, text="", reason=ResultReason.Canceled))

Running the curl-command from your second comment is working inside docker:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 13.93.122.1...
* TCP_NODELAY set
* Connected to westeurope.api.cognitive.microsoft.com (13.93.122.1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
} [5 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
} [236 bytes data]
* TLSv1.2 (IN), TLS handshake, Server hello (2):
{ [89 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [3275 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [365 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [102 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: CN=*.cognitive.microsoft.com
*  start date: Nov 29 11:22:55 2017 GMT
*  expire date: Nov 29 11:22:55 2019 GMT
*  subjectAltName: host "westeurope.api.cognitive.microsoft.com" matched cert's "*.api.cognitive.microsoft.com"
*  issuer: C=US; ST=Washington; L=Redmond; O=Microsoft Corporation; OU=Microsoft IT; CN=Microsoft IT TLS CA 2
*  SSL certificate verify ok.
} [5 bytes data]
> POST /sts/v1.0/issueToken HTTP/1.1
> Host: westeurope.api.cognitive.microsoft.com
> User-Agent: curl/7.58.0
> Accept: */*
> Ocp-Apim-Subscription-Key: <MY-SUBSCRIPTION-KEY>
> Content-type: application/x-www-form-urlencoded
> Content-Length: 0
> 
{ [5 bytes data]
< HTTP/1.1 200 OK
< Cache-Control: no-cache
< Pragma: no-cache
< Content-Length: 779
< Content-Type: application/jwt; charset=us-ascii
< Expires: -1
< X-AspNet-Version: 4.0.30319
< X-Powered-By: ASP.NET
< apim-request-id: 7ba2edb3-ea29-48ff-94f8-a7117bededee
< Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
< x-content-type-options: nosniff
< Date: Fri, 08 Mar 2019 09:35:30 GMT
< 
{ [779 bytes data]
100   779  100   779    0     0   3286      0 --:--:-- --:--:-- --:--:--  3300
* Connection #0 to host westeurope.api.cognitive.microsoft.com left intact
<MY-TOKEN>

Is there a way to see what happens on start_continuous_recognition()? Something like Debug or logging-mode?

mahilleb-msft commented 5 years ago

Unfortunately we don't have client-side logging in the released Speech SDK yet.

Would it be possible to include additional details for the canceled event? For result reason Cancelled, result.cancellation_details has additional details, cf. https://docs.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.cancellationdetails?view=azure-python.

Can you clarify whether recognize_once() works both on host and docker? And start_continuous_recognition() works only in the host case, right?

stoney95 commented 5 years ago

To clarify: Yes, recognize_once() works on host and docker. start_continuous_recognition() only works on host. Not inside docker :)

The details from the cancellation-event (when running inside docker):

DEBUG - Error-Code: 5
DEBUG - Error-Details: Connection failed (no connection to the remote host). Internal error: 1. Error details: -2. Please check network connection, firewall setting, and the region name used to create speech factory.
DEBUG - Reason: CancellationReason.Error
stoney95 commented 5 years ago

@mahilleb-msft any ideas?

I tested something else: I turned off WIFI and let the service run on the host (MacOS) which has been working so far and expected to see the same behaviour as running inside docker. But the program just hang:

INFO - SESSION STARTED: SessionEventArgs(session_id=BF5C52D0B9154CB993E3024432A9E549)
INFO - Read 626624 bytes
INFO - Write to azure
INFO - Read 0 bytes
INFO - Closed streams

(Nothing happend for some time so I canceled the service)

For comparison: I've provided the log of normal and docker behaviour in a previous comment

BrianMouncer commented 5 years ago

I'm cleaning up some of our git hub issues that have been open for a long time, and have not seen any traffic on this issue in some time. If this is still an issue, please re-open this thread.

Thanks,

Brian.

chrisbasoglu commented 5 years ago

We have a Docker container with python-sdk inside. Get it here: docker pull antsu/on-prem-client:latest

githubloader commented 1 year ago

I am surprised Azure have this limitation of not enabling their users to run the speech sdk / rest api in a container.

Why?

The only solution available is to run the actual azure speech language model in a container that requires a ridiculous amount of minimum hardware requirement to run.

Matias222 commented 1 year ago

I am having the same issue, when using recognize_once in local it works properly and in docker it stops.

Local ->

Docker ->

My DockerFile goes like this


FROM python:3.10

WORKDIR /code

COPY ./requirements.txt /code/requirements.txt

RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
RUN apt-get -y update
RUN apt-get -y upgrade
RUN apt-get install -y ffmpeg

EXPOSE 8000

COPY ./app /code/app

CMD ["gunicorn", "app.main:servicio" ,"--workers", "4", "--worker-class" ,"uvicorn.workers.UvicornWorker" , "--bind", "0.0.0.0:8000"]

Did you managed to solved it @stoney95?

rhurey commented 1 year ago

@Matias222 can you get the cancellation details for why the recognition canceled?

Somehting like:

            cancellation_details = result.cancellation_details  
            print(f"Speech Recognition was canceled: {cancellation_details.reason}")  

            if cancellation_details.reason == speechsdk.CancellationReason.Error:  
                print(f"Error details: {cancellation_details.error_details}")  
Matias222 commented 1 year ago

@rhurey Sure here are my logs INFO CANCELED CancellationDetails(reason=CancellationReason.Error, error_details="Runtime error: Failed to initialize platform (azure-c-shared). Error: 2153 SessionId: 85886b76fe1e4cd1ba1c36cc819e75f9")

rhurey commented 1 year ago

Are you running on Ubuntu 22.04 by chance?

Matias222 commented 1 year ago

My local machine is running Windows 11 and the docker "Debian GNU/Linux 12 (bookworm)"

Matias222 commented 1 year ago

Well I have modified the Dockerfile to follow this guide https://learn.microsoft.com/en-us/azure/ai-services/speech-service/quickstarts/setup-platform?pivots=programming-language-python&tabs=linux%2Cubuntu%2Cdotnetcli%2Cdotnet%2Cjre%2Cmaven%2Cnodejs%2Cmac%2Cpypi and now it looks like this

FROM ubuntu:22.04

RUN apt-get -y update
RUN apt-get -y upgrade
RUN apt-get update && apt-get install -y python3.10 python3.10-dev python3-pip

WORKDIR /code

COPY ./requirements.txt /code/requirements.txt

RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
RUN apt-get install -y ffmpeg
RUN apt-get install -y build-essential libssl-dev ca-certificates libasound2 wget

EXPOSE 8000
EXPOSE 443

COPY ./app /code/app

CMD ["gunicorn", "app.main:syntax" ,"--workers", "4", "--worker-class" ,"uvicorn.workers.UvicornWorker" , "--bind", "0.0.0.0:8000"]

and still I am getting the same error "Failed to initialize platform (azure-c-shared). Error: 2153 SessionId: f2e5e721c1fc444a9aac34dd38878afb")"

rhurey commented 1 year ago

In Unbntu 22.04 you'd need to get an OpenSSL 1.1 installed to use the SDK. We're working on OpenSSL 3.0 support, but it hasn't been released yet.

githubloader commented 1 year ago

In Unbntu 22.04 you'd need to get an OpenSSL 1.1 installed to use the SDK. We're working on OpenSSL 3.0 support, but it hasn't been released yet.

Sorry, noob question...OpenSSL 1.1 in the docker container or host? or both? And can it be OpenSSL 1.1.1d or 1.1.1u or specifically 1.1?

shipengtaov commented 1 year ago

In Unbntu 22.04 you'd need to get an OpenSSL 1.1 installed to use the SDK. We're working on OpenSSL 3.0 support, but it hasn't been released yet.

Sorry, noob question...OpenSSL 1.1 in the docker container or host? or both? And can it be OpenSSL 1.1.1d or 1.1.1u or specifically 1.1?

I'm having the same issue. By running this command in my Dockerfile fixed the issue. https://gist.github.com/joulgs/c8a85bb462f48ffc2044dd878ecaa786

2010b9 commented 11 months ago

Thanks guys! By reading this thread I was able to update my Dockerfile and run the azure speech SDK in my streamlit app. Here's the Dockerfile if anyone is interested in taking a look 🙂 (It might have some unnecessary things, though)

FROM ubuntu:22.04

ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

RUN apt-get -y update
RUN apt-get -y upgrade
RUN apt-get install -y \
    python3.10 \
    python3.10-dev \
    python3-pip \
    curl

WORKDIR /code

RUN pip3 install poetry==1.6.1
RUN apt-get install -y ffmpeg
RUN apt-get install -y build-essential libssl-dev ca-certificates libasound2 wget
RUN wget http://ports.ubuntu.com/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_arm64.deb
RUN dpkg -i libssl1.1_1.1.1f-1ubuntu2_arm64.deb
# RUN wget http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.0g-2ubuntu4_amd64.deb
# RUN sudo dpkg -i libssl1.1_1.1.0g-2ubuntu4_amd64.deb

COPY . /code/.

RUN poetry config virtualenvs.create false
RUN poetry install

EXPOSE 8501
EXPOSE 443

HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health

ENTRYPOINT ["streamlit", "run", "frontend/streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]