Azure-Samples / azure-ai-vision-sdk

SDK for Microsoft's Azure AI Vision
MIT License
76 stars 46 forks source link

Python SDK: OCR request for in memory image #52

Closed aronnoordhoek closed 9 months ago

aronnoordhoek commented 9 months ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [X] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Using the Python SDK I am wondering how I would do an OCR request for an image thats already in memory as lets say a numpy ndarray.

Desired behavior

So instead of specifying a file_name at ??? I want to pass along an image format that I already have in memory instead of writing to the file system and referencing that path. Is this possible? I could not clearly find what an image_source_buffer or frame_source as parameters for the VisionSource myself.

  service_options = sdk.VisionServiceOptions(AZURE_OCR_ENDPOINT, AZURE_OCR_KEY_1)
  vision_source = sdk.VisionSource(???)
  analysis_options = sdk.ImageAnalysisOptions()
  analysis_options.features = sdk.ImageAnalysisFeature.TEXT
  analysis_options.language = "en"

  image_analyzer = sdk.ImageAnalyzer(service_options, vision_source, analysis_options)

  return image_analyzer.analyze()

OS and Version?

Windows 11

Versions

4.0 Preview

dargilco commented 9 months ago

@aronnoordhoek please see the Python sample named image_analysis_sample_analyze_buffer

Looks like there is an issue in our public documentation of analyzing an input from memory buffer. This is the direct link to the relevant short section discussing image buffer input, however the title is wrong... it says "Image file". It should be "Image buffer". I will fix this.

Thank you for opening this issue. Let me know if you need further assistance.

Darren

dargilco commented 9 months ago

The public document has been fixed. I'll close this issue. Please reactivate if image analysis from buffer does not work for you and you need further assistance.

aronnoordhoek commented 9 months ago

Thank you for the quick response! With an np.ndarray as input it first needs encoding too. Ill just paste my solution here for anyone coming across the same problem.

def azure_api_request(image: np.ndarray) -> sdk.ImageAnalysisResult:
    service_options = sdk.VisionServiceOptions(AZURE_OCR_ENDPOINT, AZURE_OCR_KEY_1)

    image_source_buffer = sdk.ImageSourceBuffer()
    encoding_succes, encoded_image = cv2.imencode('.jpg', image)

    if encoding_succes:
        image_source_buffer.image_writer.write(bytes(encoded_image))
        vision_source = sdk.VisionSource(image_source_buffer=image_source_buffer)

        analysis_options = sdk.ImageAnalysisOptions()
        analysis_options.features = sdk.ImageAnalysisFeature.TEXT
        analysis_options.language = "en"

        image_analyzer = sdk.ImageAnalyzer(service_options, vision_source, analysis_options)

        return image_analyzer.analyze()
dargilco commented 9 months ago

Thanks you @aronnoordhoek for your additional comment. Correct, the service only accepts images in one of the common containers/file-formats (JPG, BMP, ...). See Image Requirements. If you have an a "raw" (headerless) image in RGB format, for example, you need to first convert it to one of the common formats before copying it into the SDK's ImageSourceBuffer.