GoogleCloudPlatform / php-docs-samples

A collection of samples that demonstrate how to call Google Cloud services from PHP.
http://cloud.google.com/php
Apache License 2.0
958 stars 1.03k forks source link

Could not find example for Speaker diarization? #1145

Open muneeb-ahmed-mcs opened 4 years ago

muneeb-ahmed-mcs commented 4 years ago

Hi folks, i have hard time to get an data for multiple speakers. and there is no example for it. On google official docs there is no example u can see here https://cloud.google.com/speech-to-text/docs/multiple-voices.

    use Google\Cloud\Speech\V1\SpeechClient;
    use Google\Cloud\Speech\V1\RecognitionAudio;
    use Google\Cloud\Speech\V1\RecognitionConfig;
    use Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding;

    /** Uncomment and populate these variables in your code */
  // $audioFile = 'path to an audio file';

  // change these variables if necessary
   $encoding = AudioEncoding::LINEAR16;
   $sampleRateHertz = 32000;
  $languageCode = 'en-US';

 if (!extension_loaded('grpc')) {
     throw new \Exception('Install the grpc extension (pecl install grpc)');
 }

 // When true, time offsets for every word will be included in the response.
 $enableWordTimeOffsets = true;

// get contents of a file into a string
$content = file_get_contents($audioFile);

 // set string as audio content
 $audio = (new RecognitionAudio())
      ->setContent($content);

   $speakerDiarizationConfig = (new SpeakerDiarizationConfig())  //changes that i made for different speaker
        ->setEnableSpeakerDiarization(true)  //changes that i made for different speaker
        ->setMinSpeakerCount(2)   //changes that i made for different speaker
        ->setMaxSpeakerCount(6); //changes that i made for different speaker

     // set config
    $config = (new RecognitionConfig())
       ->setEncoding($encoding)
         ->setSampleRateHertz($sampleRateHertz)
       ->setLanguageCode($languageCode)
       ->setEnableWordTimeOffsets($enableWordTimeOffsets)
       ->setDiarizationConfig($speakerDiarizationConfig);  //changes that i made for different speaker

      // create the speech client
     $client = new SpeechClient();

      // create the asyncronous recognize operation
     $operation = $client->longRunningRecognize($config, $audio);
    $operation->pollUntilComplete();

     if ($operation->operationSucceeded()) {
        $response = $operation->getResult();

      // each result is for a consecutive portion of the audio. iterate
     // through them to get the transcripts for the entire audio file.
        foreach ($response->getResults() as $result) {
            $alternatives = $result->getAlternatives();
              $mostLikely = $alternatives[0];

      foreach ($mostLikely->getWords() as $wordInfo) {
         $startTime = $wordInfo->getStartTime();
         $endTime = $wordInfo->getEndTime();
         printf('  Speaker %u Word: %s (start: %s, end: %s)' . PHP_EOL,
             $wordInfo->getSpeakerTag()            //changes that i made for different speaker
             $wordInfo->getWord(),
             $startTime->serializeToJsonString(),
             $endTime->serializeToJsonString());
        }
     }
  } else {
      print_r($operation->getError());
  }

 $client->close();

Output: Speaker %u Word: %s (start: %s, end: %s) Speaker 0 this (start: "0s", end: "0.5s") Speaker 0 is (start: "0.5s", end: "1.5s") Speaker 0 an (start: "1.5s", end: "2.5s") Speaker 0 entire (start: "2s", end: "3.5s") Speaker 0 audio (start: "3.5s", end: "4.5s") Speaker 0 sentence (start: "4.5s", end: "5.5s") Speaker 0 that (start: "5.5s", end: "6.5s") Speaker 0 google (start: "6.5s", end: "7.5s") Speaker 0 give (start: "7.5s", end: "8.5s") Speaker 0 me (start: "8.5s", end: "9.5s") Speaker 0 in (start: "9.5s", end: "10.5s") Speaker 0 its (start: "10.5s", end: "11.5s") Speaker 0 response (start: "11.5s", end: "12.5s")

Speaker 1 this (start: "0s", end: "0.5s") Speaker 1 is (start: "0.5s", end: "1.5s") Speaker 1 an (start: "1.5s", end: "2.5s") Speaker 1 entire (start: "2s", end: "3.5s") Speaker 1 audio (start: "3.5s", end: "4.5s") Speaker 1 sentence (start: "4.5s", end: "5.5s") Speaker 1 that (start: "5.5s", end: "6.5s") Speaker 1 google (start: "6.5s", end: "7.5s") Speaker 1 give (start: "7.5s", end: "8.5s") Speaker 1 me (start: "8.5s", end: "9.5s") Speaker 1 in (start: "9.5s", end: "10.5s") Speaker 1 its (start: "10.5s", end: "11.5s") Speaker 1 response (start: "11.5s", end: "12.5s")

Speaker 3 this (start: "0s", end: "0.5s") Speaker 3 is (start: "0.5s", end: "1.5s") Speaker 3 an (start: "1.5s", end: "2.5s") Speaker 3 entire (start: "2s", end: "3.5s") Speaker 3 audio (start: "3.5s", end: "4.5s") Speaker 3 sentence (start: "4.5s", end: "5.5s") Speaker 3 that (start: "5.5s", end: "6.5s") Speaker 3 google (start: "6.5s", end: "7.5s") Speaker 3 give (start: "7.5s", end: "8.5s") Speaker 3 me (start: "8.5s", end: "9.5s") Speaker 3 in (start: "9.5s", end: "10.5s") Speaker 3 its (start: "10.5s", end: "11.5s") Speaker 3 response (start: "11.5s", end: "12.5s")

For the sake of simplicity i just cut of some response. first problem as u can see speakerTag value is wrong. the audio that i am sending in request having 5 speakers. it gives me 0,1 and then jump into 3. Now i dont know why google is not responding with 0,1,2,3, and 4 speakersTag. second problem google responding with entire audio text with single person and then with the other person as u can see in my output. I cant figure out is that a problem with my code or something else. i hope u got my problem.

bshaffer commented 4 years ago

Hi there! Yes, we'd love to see your code in PHP for separating different voices! Feel free to post your code snippets here, or to submit a pull request!