Closed stoney95 closed 5 years ago
we use websockets, currently don't make any REST calls the only port we need / access is the standard https port 443
I'm forwarding that port already. But it's still not working. Is there a possibiltiy to run the SpeechRecognizer or start_continuous_recognition() in verbose
or debug
mode? Or is there any logging I could access? Because I only get no results and no error-msg.
I also tried running the service outside a docker-container without any internet-connection and then it behaves differently from running inside docker, so I think that there must be some kind of connection when running inside docker.
EDIT:
As port-forwarding is not the problem, i've been digging a little bit deeper.
Docker image I'm using is ubuntu:18.04
. My Dockerfile looks like the following:
FROM ubuntu:18.04
ADD . /code
WORKDIR /code
RUN apt-get update
RUN apt-get install -y build-essential libasound2 wget libssl1.0.0
RUN apt-get install -y python3.6 python3-pip
RUN pip3 install -r requirements.txt
CMD ["python3", "service.py"]
So setup is done like it's described here: https://docs.microsoft.com/de-de/azure/cognitive-services/speech-service/quickstart-python
Via tcpdump I found that the sdk is sending its requests to: westeurope.stt.speech.microsoft.com
When I do nslookup westeurope.stt.speech.microsoft.com
on my system I get the following response:
Server: ...
Address: ...
Non-authoritative answer:
westeurope.stt.speech.microsoft.com canonical name = crisfrontendweu.trafficmanager.net.
crisfrontendweu.trafficmanager.net canonical name = fe-prod4-weu.cris.ai.
Name: fe-prod4-weu.cris.ai
Address: 40.119.156.135
Doing this inside the docker container leads to:
Server: ...
Address: ...
Non-authoritative answer:
westeurope.stt.speech.microsoft.com canonical name = crisfrontendweu.trafficmanager.net.
crisfrontendweu.trafficmanager.net canonical name = fe-prod4-weu.cris.ai.
Name: fe-prod4-weu.cris.ai
Address: 40.119.156.135
** server can't find fe-prod4-weu.cris.ai: NXDOMAIN
wget https://westeurope.stt.speech.microsoft.com
(inside docker and outside):
--2019-03-07 12:26:12-- https://westeurope.stt.speech.microsoft.com/
Resolving westeurope.stt.speech.microsoft.com (westeurope.stt.speech.microsoft.com)... 40.119.156.135
Connecting to westeurope.stt.speech.microsoft.com (westeurope.stt.speech.microsoft.com)|40.119.156.135|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2019-03-07 12:26:12 ERROR 404: Not Found.
ping -c 2 westeurope.stt.speech.microsoft.com
(inside docker):
PING fe-prod4-weu.cris.ai (40.119.156.135) 56(84) bytes of data.
--- fe-prod4-weu.cris.ai ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1052ms
ping -c 2 westeurope.stt.speech.microsoft.com
(outside docker):
PING fe-prod4-weu.cris.ai (40.119.156.135): 56 data bytes
Request timeout for icmp_seq 0
--- fe-prod4-weu.cris.ai ping statistics ---
2 packets transmitted, 0 packets received, 100.0% packet loss
Actually I'm not sure if this problem is related to the sdk or my docker-setup.
Is it a DNS thing in the container? For example, Docker may add 8.8.8.8 as DNS, but you could be operating in a networking environment where this is blocked. https://docs.docker.com/v17.09/engine/userguide/networking/default_network/configure-dns/
I'm not very familiar with networks but I don't think so.
When you try to resolve e.g. google.com inside the container it's working:
root@linuxkit-025000000001:/# nslookup google.com
Server: 192.168.65.1
Address: 192.168.65.1#53
Non-authoritative answer:
Name: google.com
Address: 172.217.16.142
Name: google.com
Address: 2a00:1450:4001:808::200e
And also when resolving westeurope.stt.speech.microsoft.com you can figure out an IP-Address but then you run into some kind of mistake (see other comment)
EDIT:
An other sign that it might not be a DNS thing, is that I tried using recognize_once()
instead of start_continuous_recognition()
and this works find inside docker.
Whats the difference between these two methods? Maybe there could be a clue whats not working.
There's no difference between the two in terms of connection. recognize_once() is however meant only for single-shot reco, up to 15 s. So .. potentially you have silence longer than 15 seconds at the beginning.
Could you clarify whether recognize_once() works as expected on the host? I.e., is there any difference between host and Docker setup?
For cases that don't work as expected, can you provide more details, e.g., cancellation event defailts, or return values, as well as speech region and Session IDs? You can pick up session IDs from the session_started event (hopefully), similar to here:
(If you still suspect networking difference within/outside the Docker container, maybe you could also go through these steps to validate your subscription in both environments: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/troubleshooting#validate-your-subscription-key)
recognize_once()
runs identically on the host and inside docker.
My host system is MacOS and docker is ubuntu:18.04.
The region I'm working on is westeurope
. I will provide logs from the service using start_continuous_recognition()
.
Outside docker:
INFO - Start recognition
INFO - SESSION STARTED: SessionEventArgs(session_id=BF5C52D0B9154CB993E3024432A9E549)
INFO - Read 626624 bytes
INFO - Write to azure
INFO - Read 0 bytes
INFO - Closed streams
INFO - RECOGNIZED: SpeechRecognitionEventArgs(session_id=BF5C52D0B9154CB993E3024432A9E549, result=SpeechRecognitionResult(result_id=B06FE79DF53140F0A31FF580BAB83773, text="SOME TEXT", reason=ResultReason.RecognizedSpeech))
INFO - RECOGNIZED: SpeechRecognitionEventArgs(session_id=BF5C52D0B9154CB993E3024432A9E549, result=SpeechRecognitionResult(result_id=F2A7414BFBC245A0884D9568447FF755, text="SOME TEXT", reason=ResultReason.RecognizedSpeech))
INFO - RECOGNIZED: SpeechRecognitionEventArgs(session_id=BF5C52D0B9154CB993E3024432A9E549, result=SpeechRecognitionResult(result_id=36690B7E7F224125B3B980AC61D5AA10, text="SOME TEXT", reason=ResultReason.RecognizedSpeech))
INFO - RECOGNIZED: SpeechRecognitionEventArgs(session_id=BF5C52D0B9154CB993E3024432A9E549, result=SpeechRecognitionResult(result_id=47BD7E6A71ED4948937AB76B4AA42611, text="SOME TEXT", reason=ResultReason.RecognizedSpeech))
INFO - CLOSING on SpeechRecognitionCanceledEventArgs(session_id=BF5C52D0B9154CB993E3024432A9E549, result=SpeechRecognitionResult(result_id=4B3B2F5BFCB3441E8A48AFF7F85761FA, text="", reason=ResultReason.Canceled))
INFO - STOPPED: SessionEventArgs(session_id=BF5C52D0B9154CB993E3024432A9E549)
There are also RECOGNIZING
-Events but I removed them to make the logs clearer.
Inside docker:
INFO - Start recognition
INFO - SESSION STARTED: SessionEventArgs(session_id=0f6acfcb80e54ff2ab4704ef71f3be8e)
INFO - CLOSING on SpeechRecognitionCanceledEventArgs(session_id=0f6acfcb80e54ff2ab4704ef71f3be8e, result=SpeechRecognitionResult(result_id=d82b81a920ce467a86dce11dd292a12e, text="", reason=ResultReason.Canceled))
INFO - Read 628672 bytes
INFO - Write to azure
INFO - Read 0 bytes
INFO - Closed streams
So the SpeechRecognizer fires a CANCELED
-Event even before I start to write something to the Input-Stream.
So I started debugging inside docker. And speech_recognizer.start_continuous_recognition()
results in:
INFO - SESSION STARTED: SessionEventArgs(session_id=0f6acfcb80e54ff2ab4704ef71f3be8e)
INFO - CLOSING on SpeechRecognitionCanceledEventArgs(session_id=0f6acfcb80e54ff2ab4704ef71f3be8e, result=SpeechRecognitionResult(result_id=d82b81a920ce467a86dce11dd292a12e, text="", reason=ResultReason.Canceled))
Running the curl
-command from your second comment is working inside docker:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 13.93.122.1...
* TCP_NODELAY set
* Connected to westeurope.api.cognitive.microsoft.com (13.93.122.1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
} [5 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
} [236 bytes data]
* TLSv1.2 (IN), TLS handshake, Server hello (2):
{ [89 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [3275 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [365 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [102 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: CN=*.cognitive.microsoft.com
* start date: Nov 29 11:22:55 2017 GMT
* expire date: Nov 29 11:22:55 2019 GMT
* subjectAltName: host "westeurope.api.cognitive.microsoft.com" matched cert's "*.api.cognitive.microsoft.com"
* issuer: C=US; ST=Washington; L=Redmond; O=Microsoft Corporation; OU=Microsoft IT; CN=Microsoft IT TLS CA 2
* SSL certificate verify ok.
} [5 bytes data]
> POST /sts/v1.0/issueToken HTTP/1.1
> Host: westeurope.api.cognitive.microsoft.com
> User-Agent: curl/7.58.0
> Accept: */*
> Ocp-Apim-Subscription-Key: <MY-SUBSCRIPTION-KEY>
> Content-type: application/x-www-form-urlencoded
> Content-Length: 0
>
{ [5 bytes data]
< HTTP/1.1 200 OK
< Cache-Control: no-cache
< Pragma: no-cache
< Content-Length: 779
< Content-Type: application/jwt; charset=us-ascii
< Expires: -1
< X-AspNet-Version: 4.0.30319
< X-Powered-By: ASP.NET
< apim-request-id: 7ba2edb3-ea29-48ff-94f8-a7117bededee
< Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
< x-content-type-options: nosniff
< Date: Fri, 08 Mar 2019 09:35:30 GMT
<
{ [779 bytes data]
100 779 100 779 0 0 3286 0 --:--:-- --:--:-- --:--:-- 3300
* Connection #0 to host westeurope.api.cognitive.microsoft.com left intact
<MY-TOKEN>
Is there a way to see what happens on start_continuous_recognition()
? Something like Debug
or logging
-mode?
Unfortunately we don't have client-side logging in the released Speech SDK yet.
Would it be possible to include additional details for the canceled event?
For result reason Cancelled
, result.cancellation_details
has additional details, cf. https://docs.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.cancellationdetails?view=azure-python.
Can you clarify whether recognize_once()
works both on host and docker? And start_continuous_recognition()
works only in the host case, right?
To clarify: Yes, recognize_once()
works on host and docker. start_continuous_recognition()
only works on host. Not inside docker :)
The details from the cancellation-event (when running inside docker):
DEBUG - Error-Code: 5
DEBUG - Error-Details: Connection failed (no connection to the remote host). Internal error: 1. Error details: -2. Please check network connection, firewall setting, and the region name used to create speech factory.
DEBUG - Reason: CancellationReason.Error
@mahilleb-msft any ideas?
I tested something else: I turned off WIFI and let the service run on the host (MacOS) which has been working so far and expected to see the same behaviour as running inside docker. But the program just hang:
INFO - SESSION STARTED: SessionEventArgs(session_id=BF5C52D0B9154CB993E3024432A9E549)
INFO - Read 626624 bytes
INFO - Write to azure
INFO - Read 0 bytes
INFO - Closed streams
(Nothing happend for some time so I canceled the service)
For comparison: I've provided the log of normal and docker behaviour in a previous comment
I'm cleaning up some of our git hub issues that have been open for a long time, and have not seen any traffic on this issue in some time. If this is still an issue, please re-open this thread.
Thanks,
Brian.
We have a Docker container with python-sdk inside. Get it here: docker pull antsu/on-prem-client:latest
I am surprised Azure have this limitation of not enabling their users to run the speech sdk / rest api in a container.
Why?
The only solution available is to run the actual azure speech language model in a container that requires a ridiculous amount of minimum hardware requirement to run.
I am having the same issue, when using recognize_once in local it works properly and in docker it stops.
Local ->
Docker ->
My DockerFile goes like this
FROM python:3.10
WORKDIR /code
COPY ./requirements.txt /code/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
RUN apt-get -y update
RUN apt-get -y upgrade
RUN apt-get install -y ffmpeg
EXPOSE 8000
COPY ./app /code/app
CMD ["gunicorn", "app.main:servicio" ,"--workers", "4", "--worker-class" ,"uvicorn.workers.UvicornWorker" , "--bind", "0.0.0.0:8000"]
Did you managed to solved it @stoney95?
@Matias222 can you get the cancellation details for why the recognition canceled?
Somehting like:
cancellation_details = result.cancellation_details
print(f"Speech Recognition was canceled: {cancellation_details.reason}")
if cancellation_details.reason == speechsdk.CancellationReason.Error:
print(f"Error details: {cancellation_details.error_details}")
@rhurey Sure here are my logs
INFO CANCELED CancellationDetails(reason=CancellationReason.Error, error_details="Runtime error: Failed to initialize platform (azure-c-shared). Error: 2153 SessionId: 85886b76fe1e4cd1ba1c36cc819e75f9")
Are you running on Ubuntu 22.04 by chance?
My local machine is running Windows 11 and the docker "Debian GNU/Linux 12 (bookworm)"
Well I have modified the Dockerfile to follow this guide https://learn.microsoft.com/en-us/azure/ai-services/speech-service/quickstarts/setup-platform?pivots=programming-language-python&tabs=linux%2Cubuntu%2Cdotnetcli%2Cdotnet%2Cjre%2Cmaven%2Cnodejs%2Cmac%2Cpypi and now it looks like this
FROM ubuntu:22.04
RUN apt-get -y update
RUN apt-get -y upgrade
RUN apt-get update && apt-get install -y python3.10 python3.10-dev python3-pip
WORKDIR /code
COPY ./requirements.txt /code/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
RUN apt-get install -y ffmpeg
RUN apt-get install -y build-essential libssl-dev ca-certificates libasound2 wget
EXPOSE 8000
EXPOSE 443
COPY ./app /code/app
CMD ["gunicorn", "app.main:syntax" ,"--workers", "4", "--worker-class" ,"uvicorn.workers.UvicornWorker" , "--bind", "0.0.0.0:8000"]
and still I am getting the same error "Failed to initialize platform (azure-c-shared). Error: 2153 SessionId: f2e5e721c1fc444a9aac34dd38878afb")"
In Unbntu 22.04 you'd need to get an OpenSSL 1.1 installed to use the SDK. We're working on OpenSSL 3.0 support, but it hasn't been released yet.
In Unbntu 22.04 you'd need to get an OpenSSL 1.1 installed to use the SDK. We're working on OpenSSL 3.0 support, but it hasn't been released yet.
Sorry, noob question...OpenSSL 1.1 in the docker container or host? or both? And can it be OpenSSL 1.1.1d or 1.1.1u or specifically 1.1?
In Unbntu 22.04 you'd need to get an OpenSSL 1.1 installed to use the SDK. We're working on OpenSSL 3.0 support, but it hasn't been released yet.
Sorry, noob question...OpenSSL 1.1 in the docker container or host? or both? And can it be OpenSSL 1.1.1d or 1.1.1u or specifically 1.1?
I'm having the same issue. By running this command in my Dockerfile fixed the issue. https://gist.github.com/joulgs/c8a85bb462f48ffc2044dd878ecaa786
Thanks guys! By reading this thread I was able to update my Dockerfile and run the azure speech SDK in my streamlit app. Here's the Dockerfile if anyone is interested in taking a look 🙂 (It might have some unnecessary things, though)
FROM ubuntu:22.04
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8
RUN apt-get -y update
RUN apt-get -y upgrade
RUN apt-get install -y \
python3.10 \
python3.10-dev \
python3-pip \
curl
WORKDIR /code
RUN pip3 install poetry==1.6.1
RUN apt-get install -y ffmpeg
RUN apt-get install -y build-essential libssl-dev ca-certificates libasound2 wget
RUN wget http://ports.ubuntu.com/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_arm64.deb
RUN dpkg -i libssl1.1_1.1.1f-1ubuntu2_arm64.deb
# RUN wget http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.0g-2ubuntu4_amd64.deb
# RUN sudo dpkg -i libssl1.1_1.1.0g-2ubuntu4_amd64.deb
COPY . /code/.
RUN poetry config virtualenvs.create false
RUN poetry install
EXPOSE 8501
EXPOSE 443
HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
ENTRYPOINT ["streamlit", "run", "frontend/streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]
Hi,
I'm working with the python
azure-cognitiveservices-speech
-package.I'm building a service that is using the speech_recognizer to transcript speech - it's basically the same as in the examples. Running the service works fine, but I want to use it inside a docker container. Usually I should get some
RECOGNIZED
-events and after the whole text is processed I receive aCLOSING
-event. When I'm running inside docker I don't receive anyRECOGNIZED
-events.This looks like the sdk can't reach the azure backend, maybe there are some ports that need to be forwarded. But actually I have no clue which ports should be forwarded. So, does the SDK use REST-calls or is it establishing some kind of socket-connection? And if, which ports is it using? Or is there some kind of best-practice to use the sdk inside docker?