Azure / azure-sdk-for-js

This repository is for active development of the Azure SDK for JavaScript (NodeJS & Browser). For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/javascript/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-js.
MIT License
2.09k stars 1.2k forks source link

Azure Document Intelligence big files issue #31025

Open PiratesKing13 opened 2 months ago

PiratesKing13 commented 2 months ago

I am using Azure document Intelligence to read (OCR) my pdf files, my code is Ok with files less than 100 pages and 200 MB in size. but when I want to pass this limit, I face this error in my code

Unexpected error: RestError: Error reading response as text: aborted { "name": "RestError", "code": "PARSE_ERROR", "message": "Error reading response as text: aborted" }

I have also checked Document intelligence limitation for my tier subscription and it support files up to 500 MB and 2000 pages which I am not passing that limit.

I am using node version 20.16.0 Windows 10 @azure/ai-form-recognizer --> 5.0.0 @azure/storage-blob --> 12.17.0

here is my code

import { DocumentAnalysisClient, AzureKeyCredential, } from '@azure/ai-form-recognizer'; import { BlobSASPermissions, BlobServiceClient, ContainerClient, RestError, StorageSharedKeyCredential, generateBlobSASQueryParameters, } from '@azure/storage-blob'; import { Injectable } from '@nestjs/common'; import { ConfigService } from '@nestjs/config'; import * as fs from 'fs/promises';

@Injectable() export class DocumentInteligenceService { private documentAnalysisClient: DocumentAnalysisClient; private endpoint; private apiKey; private readonly connectionString: string; private readonly containerName: string; private readonly blobServiceClient: BlobServiceClient; private readonly storageAccountName: string; private readonly storageAccountKey: string; private readonly containerClient: ContainerClient;

constructor(private configService: ConfigService) { this.endpoint = this.configService.get( 'AzureFormRecognizer.Endpoint', ); this.apiKey = this.configService.get('AzureFormRecognizer.ApiKey');

this.documentAnalysisClient = new DocumentAnalysisClient(
  this.endpoint,
  new AzureKeyCredential(this.apiKey),
);

this.connectionString = this.configService.get<string>(
  'AzureStorageAccount.ConnectionString',
);

this.containerName = this.configService.get<string>(
  'AzureStorageAccount.ContainerName',
);

this.storageAccountName = this.configService.get<string>(
  'AzureStorageAccount.StorageAccountName',
);
this.storageAccountKey = this.configService.get<string>(
  'AzureStorageAccount.StorageAccountKey',
);

this.blobServiceClient = BlobServiceClient.fromConnectionString(
  this.connectionString,
);

this.containerClient = this.blobServiceClient.getContainerClient(
  this.containerName,
);

}

async analyzeDocumentLayout(blobUrl: string): Promise { try { const poller = await this.documentAnalysisClient.beginAnalyzeDocumentFromUrl( 'prebuilt-read', blobUrl, { onProgress: (state) => console.log(Status: ${state.status}), }, ); const result = await poller.pollUntilDone();

  const resultText = JSON.stringify(result, null, 2);

  await fs.writeFile(`test.json`, resultText, 'utf-8');
  console.log('The results have been saved to a text file.');
} catch (error) {
  if (error instanceof RestError) {
    console.error('Error:', error.message);
    // Add logic to retry or handle specific RestError scenarios.
  } else {
    console.error('Unexpected error:', error);
  }
}

}

github-actions[bot] commented 2 months ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.