aws / aws-sdk-java-v2

The official AWS SDK for Java - Version 2
Apache License 2.0
2.17k stars 840 forks source link

Serialization of software.amazon.awssdk.services.textract.model.Block fails due to DefaultSdkAutoConstructList #2782

Open ejschoen opened 2 years ago

ejschoen commented 2 years ago

Describe the bug

Block is documented as implementing Serializable, but serialization fails due to DefaultSdkAutoConstructList not being serializable. This issue was also raised in the closed PR 976.

Expected behavior

This bit of Clojure should work:

(defn save-blocks
  [^Path cache-path blocks]
  (Files/createDirectories (Paths/get @textract-cache (make-array String 0)) (make-array FileAttribute 0))
  (with-open [os (Files/newOutputStream cache-path (make-array OpenOption 0))
              oos (ObjectOutputStream. os)]
    (.writeObject oos blocks)))

Current behavior

                                                        save-blocks             textract.clj:   43
                                                   java.io.ObjectOutputStream.writeObject  ObjectOutputStream.java:  348
                                                  java.io.ObjectOutputStream.writeObject0  ObjectOutputStream.java: 1178
                                           java.io.ObjectOutputStream.writeOrdinaryObject  ObjectOutputStream.java: 1432
                                               java.io.ObjectOutputStream.writeSerialData  ObjectOutputStream.java: 1509
                                            java.io.ObjectOutputStream.defaultWriteFields  ObjectOutputStream.java: 1548
                                                  java.io.ObjectOutputStream.writeObject0  ObjectOutputStream.java: 1178
                                           java.io.ObjectOutputStream.writeOrdinaryObject  ObjectOutputStream.java: 1432
                                               java.io.ObjectOutputStream.writeSerialData  ObjectOutputStream.java: 1496
                                              java.io.ObjectStreamClass.invokeWriteObject   ObjectStreamClass.java: 1154
                                                                                      ...
                                                          java.util.ArrayList.writeObject           ArrayList.java:  768
                                                   java.io.ObjectOutputStream.writeObject  ObjectOutputStream.java:  348
                                                  java.io.ObjectOutputStream.writeObject0  ObjectOutputStream.java: 1178
                                           java.io.ObjectOutputStream.writeOrdinaryObject  ObjectOutputStream.java: 1432
                                               java.io.ObjectOutputStream.writeSerialData  ObjectOutputStream.java: 1509
                                            java.io.ObjectOutputStream.defaultWriteFields  ObjectOutputStream.java: 1548
                                                  java.io.ObjectOutputStream.writeObject0  ObjectOutputStream.java: 1184
java.io.NotSerializableException: software.amazon.awssdk.core.util.DefaultSdkAutoConstructList
   java.io.WriteAbortedException: writing aborted; java.io.NotSerializableException: software.amazon.awssdk.core.util.DefaultSdkAutoConstructList```

### Steps to Reproduce

See above.  Complete example requires code to create and execute an AnalyzeDocumentRequest, which in Clojure looks like:

```Clojure
(ns textract
  (:require [clojure.java.io :as io]
            [clojure.string :as str])
  (:use [taoensso.timbre])
  (:import [java.util ArrayList]
           [java.io ByteArrayInputStream ByteArrayOutputStream ObjectInputStream ObjectOutputStream]
           [java.nio.file Path Paths Files LinkOption OpenOption]
           [java.nio.file.attribute FileAttribute]
           [java.awt.image BufferedImage]
           [javax.imageio ImageIO ImageWriter]
           [org.apache.commons.codec.digest DigestUtils]
           [software.amazon.awssdk.services.textract
            TextractClient TextractClientBuilder]
           [software.amazon.awssdk.services.textract.model
            AnalyzeDocumentRequest AnalyzeDocumentRequest$Builder
            AnalyzeDocumentResponse
            Document Document$Builder
            FeatureType
            Block BlockType]
           [software.amazon.awssdk.regions Region]
           [software.amazon.awssdk.core SdkBytes]))

(defn save-blocks
  [^Path cache-path blocks]
  (Files/createDirectories (Paths/get @textract-cache (make-array String 0)) (make-array FileAttribute 0))
  (with-open [os (Files/newOutputStream cache-path (make-array OpenOption 0))
              oos (ObjectOutputStream. os)]
    (.writeObject oos blocks)))

(def known-feature-types
  {:forms FeatureType/FORMS
   :tables FeatureType/TABLES})

(defn extract-image
  "I don't do a whole lot."
  [^BufferedImage image feature-types]
  (info "Generating png format image")
  (let [os (java.io.ByteArrayOutputStream.)]
    (ImageIO/write image "png" os)
    (let [image-bytes (.toByteArray os)
          md5 (DigestUtils/md5Hex image-bytes)
          cache-path (Paths/get @textract-cache (into-array String [(str md5 ".obj")]))]
      (if (Files/exists cache-path (make-array LinkOption 0))
        (restore-blocks cache-path)
        (try
          (with-open [input-stream (ByteArrayInputStream. image-bytes)]
            (let [^Document$Builder document-builder (doto (Document/builder)
                                                       (.bytes (SdkBytes/fromInputStream input-stream)))
                  ^Document document (.build document-builder)
                  ^TextractClientBuilder client-builder (doto (TextractClient/builder)
                                                          (.region Region/US_EAST_1))
                  ^TextractClient client (.build client-builder)
                  feature-types (for [ft feature-types
                                      :let [known-type (get known-feature-types ft)]
                                      :when known-type]
                                  known-type)
                  ^AnalyzeDocumentRequest$Builder request-builder (doto (AnalyzeDocumentRequest/builder)
                                                                    (.featureTypes (into-array FeatureType feature-types))
                                                                    (.document document))
                  ^AnalyzeDocumentRequest request (.build request-builder)
                  _ (info "Sending AnalyzeDocument request to AWS")
                  ^AnalyzeDocumentResponse response (.analyzeDocument client request)
                  blocks (.blocks response)]
              (save-blocks cache-path blocks)
              blocks))
          (catch Exception e
            (errorf "Error in textract: %s" (.getMessage e)))
          (finally  (.close os)))))))

Possible Solution

No response

Context

It would be nice to be able to serialize results from AnalyzeDocument in a TTL cache to avoid rerunning recent prior analyses. But if it's not feasible to implement serialization for DefaultSdkAutoConstructList, then the objects that depend upon that class should not be documented as serializable.

AWS Java SDK version used

2.17.63

JDK version used

1.8

Operating System and version

Debian 9

dagnir commented 2 years ago

@debora-ito Removed needs-review agree, that this is a bug.