deepgram / deepgram-js-sdk

Official JavaScript SDK for Deepgram's automated speech recognition APIs.
https://developers.deepgram.com
MIT License
127 stars 45 forks source link

Not able to hit the WebSocket with newly created AWS instance(on prem feature) #221

Closed yashjais closed 5 months ago

yashjais commented 6 months ago

What is the current behavior?

I am not able to get the transcription when I use the WebSocket with the new AWS instance that I created.

I'm getting the following error.

{
    _events: [Object: null prototype] {
      open: [Function],
      close: [Function],
      error: [Function],
      message: [Function]
    },
    _eventsCount: 4,
    _maxListeners: undefined,
    _binaryType: 'nodebuffer',
    _closeCode: 1006,
    _closeFrameReceived: false,
    _closeFrameSent: false,
    _closeMessage: '',
    _closeTimer: null,
    _extensions: {},
    _protocol: '',
    _readyState: 2,
    _receiver: null,
    _sender: null,
    _socket: null,
    _bufferedAmount: 0,
    _isServer: false,
    _redirects: 0,
    _url: 'ws://instance_url.com/v1/listen?model=nova-2',
    _req: ClientRequest {
      _events: [Object: null prototype],
      _eventsCount: 5,
      _maxListeners: undefined,
      outputData: [],
      outputSize: 0,
      writable: true,
      destroyed: true,
      _last: true,
      chunkedEncoding: false,
      shouldKeepAlive: true,
      _defaultKeepAlive: true,
      useChunkedEncodingByDefault: false,
      sendDate: false,
      _removedConnection: false,
      _removedContLen: false,
      _removedTE: false,
      _contentLength: 0,
      _hasBody: true,
      _trailer: '',
      finished: true,
      _headerSent: true,
      socket: [Socket],
      _header: 'GET /v1/listen?model=nova-2 HTTP/1.1\r\n' +
        'Sec-WebSocket-Version: 13\r\n' +
        'Sec-WebSocket-Key: dtrfrb1XD/4W1GHVjvjxQA==\r\n' +
        'Connection: Upgrade\r\n' +
        'Upgrade: websocket\r\n' +
        'Authorization: token token_number\r\n' +
        'User-Agent: @deepgram/sdk/2.4.0 node/14.18.1\r\n' +
        'Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits\r\n' +
        'Host: ec2-3-91-57-125.compute-1.amazonaws.com\r\n' +
        '\r\n',
      _keepAliveTimeout: 0,
      _onPendingData: [Function: noopPendingOutput],
      agent: undefined,
      socketPath: undefined,
      method: 'GET',
      maxHeaderSize: undefined,
      insecureHTTPParser: undefined,
      path: '/v1/listen?model=nova-2',
      _ended: false,
      res: [IncomingMessage],
      aborted: true,
      timeoutCb: null,
      upgradeOrConnect: false,
      parser: [HTTPParser],
      maxHeadersCount: null,
      reusedSocket: false,
      host: 'ec2-3-91-57-125.compute-1.amazonaws.com',
      protocol: 'http:',
      [Symbol(kCapture)]: false,
      [Symbol(kNeedDrain)]: false,
      [Symbol(corked)]: 0,
      [Symbol(kOutHeaders)]: [Object: null prototype]
    },
    [Symbol(kCapture)]: false
  },
  type: 'error',
  message: 'Unexpected server response: 400',
  error: Error: Unexpected server response: 400
      at ClientRequest.<anonymous> (/Users/yash/Projects/Bright-Github/backend/node_modules/@deepgram/sdk/dist/index.js:1:84270)
      at ClientRequest.emit (events.js:400:28)
      at ClientRequest.emit (domain.js:475:12)
      at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:647:27)
      at HTTPParser.parserOnHeadersComplete (_http_common.js:127:17)
      at Socket.socketOnData (_http_client.js:515:22)
      at Socket.emit (events.js:400:28)
      at Socket.emit (domain.js:475:12)
      at addChunk (internal/streams/readable.js:293:12)
      at readableAddChunk (internal/streams/readable.js:267:9)
      at Socket.Readable.push (internal/streams/readable.js:206:10)
      at TCP.onStreamRead [as _originalOnread] (internal/stream_base_commons.js:188:23)
      at TCP.callbackTrampoline (internal/async_hooks.js:130:17)
}

Steps to reproduce

Whenever I open the socket from the frontend, the error shows up. But when I request the DeepGram's server, the transcription comes fine(line number 25 in the handleDeepGramWebSocketConnection function).

Here's the backend code I'm using.

wss.on('connection', handleDeepGramWebSocketConnection);

// file that contains handleDeepGramWebSocketConnection function.
import WebSocket from 'ws';
import { Deepgram } from '@deepgram/sdk';

import logger from './util/logger';

const { DEEPGRAM_API_KEY } = process.env;

const handleDeepGramWebSocketConnection = (ws) => {
  try {
    if (!DEEPGRAM_API_KEY) {
      logger.info('Error: DEEPGRAM_API_KEY is missing in the environment variables.');

      const errorMessage = {
        type: 'error',
        message: 'DEEPGRAM_API_KEY is missing in the environment variables.',
      };

      if (ws.readyState === WebSocket.OPEN) {
        ws.send(JSON.stringify(errorMessage));
      }

      ws.close();

      return;
    }

    const apiUrl = 'instance.com'; // **WILL NOT GET THE TRANSCRIPTOIN, WHEN USING THIS**
    // const apiUrl = 'wss://api.deepgram.com/v1/listen';
    // const requireSSL = false;
    // const deepgram = new Deepgram(DEEPGRAM_API_KEY, apiUrl, requireSSL);
    const deepgram = new Deepgram(DEEPGRAM_API_KEY); // **getting the transcription when using this.**

    const deepgramLive = deepgram.transcription.live({
      model: 'nova-2',
    });

    deepgramLive.addListener('open', () => console.log('dg onopen'));

    deepgramLive.addListener('error', (error) => {
      console.log('error in here', error);

      const errorMessage = {
        type: 'error',
        message: error?.message,
        error: error?.error,
      };

      if (ws.readyState === WebSocket.OPEN) {
        ws.send(JSON.stringify(errorMessage));
      }
    });

    // eslint-disable-next-line no-param-reassign
    ws.onmessage = (event) => deepgramLive.send(event.data);

    // eslint-disable-next-line no-param-reassign
    ws.onclose = () => deepgramLive.finish();

    deepgramLive.addListener('transcriptReceived', (data) => ws.send(data));
  } catch (err) {
    console.log('err', err);
  }
};

export default handleDeepGramWebSocketConnection;

Here's the relevant frontend code

const transcription = () => {
    try {
      setIsRecording(true);

      navigator.mediaDevices.getUserMedia({ audio: true }).then((stream) => {
        if (!MediaRecorder.isTypeSupported('audio/webm')) {
          return alert('Browser not supported');
        }

        const mediaRecorder = new MediaRecorder(stream, {
          mimeType: 'audio/webm',
        });

        // create a websocket connection
        const url = websocketUrl();
        console.log('url', url);
        const socket = new WebSocket(url);
        // TODO: Remove this
        // const socket = new WebSocket('ws://localhost:5200');
        socket.onopen = () => {
          mediaRecorder.addEventListener('dataavailable', async (event) => {
            if (event.data.size > 0 && socket.readyState === 1) {
              socket.send(event.data);
            }
          });
          mediaRecorder.start(1000);
        };

        socket.onmessage = (msg) => {
          const received = JSON.parse(msg.data) || {};
          console.log('on message', msg, received);
          if (received?.type === 'error' || (received?.type === 'Metadata' && received?.transaction_key === 'deprecated')) {
            console.log('error', received);
            createNotification(
              received?.message || 'Something went wrong',
            );
            setIsRecording(false);
            console.log('socket', socket);
            // if (socket.readyState === 1) socket.close();
            // socket.close();
            setTimeout(() => {
              if (socketRef.current.readyState === 1) {
                socketRef.current.close();
              }
            }, 500);
          } else if (received?.channel?.alternatives[0]) {
            const { transcript } = received?.channel?.alternatives[0];
            if (transcript) {
              setMessage((prevState) => ({
                ...prevState,
                ...(received.is_final && { text: `${prevState.text} ${transcript}` }),
                interimMessage: received.is_final ? '' : `${prevState.text} ${transcript}`,
                isFinal: !!received.is_final,
              }));
            }
          }
        };

        socket.onclose = () => {
          console.log({ event: 'onclose' });
        };

        socket.onerror = (error) => {
          console.log({ event: 'onerror', error });
        };

        socketRef.current = socket;
      });
    } catch (err) {
      console.log('in the err of transcription');
    }
  };

Expected behavior

I SHOULD get the results if I hit the newly created AWS instance.

Also, the issue should be in Websocket code. As I am able to get the transcription from the pre-recorded audio.

const getTranscription = async ({
  // recording,
  // url,
}) => {
  try {
    // TODO: handle this logic as well.
    if (!DEEPGRAM_API_KEY) {
      logger.info('Error: DEEPGRAM_API_KEY is missing in the environment variables.');

      const errorMessage = {
        type: 'error',
        message: 'DEEPGRAM_API_KEY is missing in the environment variables.',
      };

      return errorMessage;
    }

    const url = 'insance.com';
    const requireSSL = false;
    const deepgram = new Deepgram(DEEPGRAM_API_KEY, url, requireSSL);

    const filePath = path.join(__dirname, '../../../../bueller.wav');

    const transcription = await deepgram.transcription.preRecorded({
      // url: 'https://dpgr.am/spacewalk.wav', // working
      stream: fs.createReadStream(filePath), // working
      mimetype: 'audio/ogg',
    }, {
      model: 'nova-2',
    });

    // version 3.1 code
    // const deepgram = createClient(DEEPGRAM_API_KEY, { global: { url } });

    // const transcription = await deepgram.listen.prerecorded.transcribeFile(
    //   fs.readFileSync(filePath),
    //   {
    //     model: 'nova-2',
    //   },
    // );

    console.log('transcription', JSON.stringify(transcription));

    if (transcription?.results) return transcription?.results;

    return transcription;
  } catch (err) {
    console.log('err', err);
    return null;
  }
};

Please tell us about your environment

Other information

Here are the docker logs from the instance when I hit the request. The status in this logs is 400, so I should be missing something from the request.

2023-12-21T09:17:26.640219485Z  INFO request{id=1d26f939}: stem::middleware::tracing: new
2023-12-21T09:17:26.64027988Z  INFO request{id=1d26f939 path=listen}: stem::middleware::tracing: New request. request_uuid=1d26f939-fca9-4f3d-86c4-ea375088eb47 query="model=nova-2"
2023-12-21T09:17:26.640337988Z  INFO request{id=1d26f939 path=listen}: tower_http::trace::on_response: finished processing request latency=0 ms status=400
2023-12-21T09:17:26.640359665Z  INFO request{id=1d26f939 path=listen}: stem::middleware::tracing: close time.busy=110µs time.idle=31.4µs
lukeocodes commented 6 months ago

If you want to make client-side requests, you need to use v3 of the SDK.

yashjais commented 6 months ago

Could you point me to a working example of the code of v3?

Also, I am able to get the transcription when I am hitting the DeepGram API, it's just that when I point the URL to my instance, it starts giving error.

yashjais commented 6 months ago

nvm. I found it.

But with the new version, I'm not even able to get a transcription and getting this error when I use the above code. TypeError: _sdk.Deepgram is not a constructor

lukeocodes commented 6 months ago

@yashjais can you tell me what exactly you're trying to achieve in brief? I might be able to whip up a small example that can help. I think there is a mix up here in your code between the old version and the new.

(I'm speaking to Dustin and Jason internally, too)

yashjais commented 6 months ago

@lukeocodes

I am trying to get the transcription using the streaming method via my AWS instance.

The architecture is like -> From the frontend, I am calling my backend(via socket - sending binary data) and then in the backend, I'm using DeepGram JS SDK to get the transcription.

When I just use the DeepGram server to get the transcription, it's working fine and well. const deepgram = new Deepgram(DEEPGRAM_API_KEY);

But when I hit my instance to get the transcription, it's giving me 400.

const apiUrl = 'instance.com'; // **WILL NOT GET THE TRANSCRIPTOIN, WHEN USING THIS**
const requireSSL = false;
const deepgram = new Deepgram(DEEPGRAM_API_KEY, apiUrl, requireSSL);

Now I can think of a couple of issues that can cause this.

lukeocodes commented 5 months ago

I believe this has now been resolved!