DoubangoTelecom / ultimateMRZ-SDK

Machine-readable zone/travel document (MRZ / MRTD) detector and recognizer using deep learning
https://www.doubango.org/webapps/mrz/
Other
175 stars 49 forks source link

Reading MRZ not working #32

Closed imranbaloch closed 3 years ago

imranbaloch commented 3 years ago

I have uploaded an image at https://ibb.co/SxVNXhc. I got this from your website and tried in my application and in your website its shows as below,

image

What can be the reason?

DoubangoTelecom commented 3 years ago

If I change the image to jpg or make the width multiple of 16 it works. The server uses libpng and it's probably a bug as the width is odd (501). What is the most important is that it works with the SDK:

recognizer.exe ^
    --image C:/Users/dmi/Desktop/safe.png ^
    --assets ../../../assets

returns

*[ULTMRZ_SDK INFO]: result: {"duration":226,"frame_id":0,"zones":[{"lines":[{"confidence":92.0,"text":"P<*USCITIZ*N<<J*N*<<<<<<<<<<<<<<<<<<<<<<<<<<","warpedBox":[-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0]},{"confidence":84.0,"text":"P*09404433*US*40*077F1903212<17332717P<<<<**","warpedBox":[-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0]}],"warpedBox":[25,237,475,237,475,273,25,273]}]}
DoubangoTelecom commented 3 years ago

Your image is interlaced (PNG_INTERLACE_ADAM7). Changed the code at https://www.doubango.org/webapps/mrz/ to support this type. I confirm this is not a bug in the SDK.

imranbaloch commented 3 years ago

Sorry I don't get it. User can upload any type of image. What can I do as a developer to fix this issue?

imranbaloch commented 3 years ago

Your image is interlaced (PNG_INTERLACE_ADAM7). Changed the code at https://www.doubango.org/webapps/mrz/ to support this type. I confirm this is not a bug in the SDK.

What you changed that I will change as well?

DoubangoTelecom commented 3 years ago

You don't need to change anything, the issue is on the server not the sdk. As already explained, if you try with rhe sdk you'll see it's working.

imranbaloch commented 3 years ago

Server? Isn't our solution is on-prem?

DoubangoTelecom commented 3 years ago

What solution ? What we are providing here on github is a sdk and as already said, there is no bug on it. You sent an image to our cloud and it was not working and I fixed it. The issue was on our cloud, forget all explanation I provided. You, as a developer you only have access to the SDK not the cloud. The sdk works on-prem, this is another reason not to worry about an image not working with our cloud.

If you check the recognizer sample, you will see the image decoding is done using stbi, on the cloud we are using libpng. The issue is in libpng as already explained.

Again, you need to worry about the sdk not our cloud. Try the recognizer sample application and you'll see it's working.

imranbaloch commented 3 years ago

I mean we purchased the on-prem license means we are running the licensed SDK in our servers. I am facing the same issue as your cloud application faced. Now I need to do the same fix. How can I?

imranbaloch commented 3 years ago

Just for more information, we are using the below code which returns zone as null,

  public static UltMrzSdkResult process(ULTMRZ_SDK_IMAGE_TYPE imageType, IntPtr imageData, uint imageWidthInSamples, uint imageHeightInSamples) {
    UltMrzSdkResult ret = new UltMrzSdkResult(ultimateMrzSdkPINVOKE.UltMrzSdkEngine_process__SWIG_2((int)imageType, imageData, imageWidthInSamples, imageHeightInSamples), true);
    return ret;
  }

var result = CheckResult("Process", UltMrzSdkEngine.process(
                            ULTMRZ_SDK_IMAGE_TYPE.ULTMRZ_SDK_IMAGE_TYPE_RGB24, // TODO(dmi): not correct. C# image decoder outputs BGR24 instead of RGB24
                            imageData.Scan0,
                            (uint)image.Width,
                            (uint)image.Height
                        ));
DoubangoTelecom commented 3 years ago

I have tried with the C# sample and it's working fine. Your process function has a "TODO(dmi): " and it no longer looks like this. New version is like this: https://github.com/DoubangoTelecom/ultimateMRZ-SDK/blob/2b4ff4f8d058ed317964df5fa4469d7003901acc/samples/csharp/recognizer/Program.cs#L228.

You'll need version 2.3.4 from june 17: https://github.com/DoubangoTelecom/ultimateMRZ-SDK/commit/36844461441ea93d0ebd3b9eb954b2c75df89b08#diff-ce8cd4254ce5be144e16ac427b120058

As already explained the issue is in the image decoding. Our cloud use libpng which has the issue. The recognizer sample I have tried yesterday uses stbi and this is why I have no issue. In C# neither libpng nor stbi are used to decode the image, c-sharp functions are used. If you don't want to move to 2.3.4, just make sure to check the C# to see how the image decoding is now handled. New code handle width alignment on DWORD, your image has odd width and this may be the problem in your code. In version 2.3.4 alignment is handled at https://github.com/DoubangoTelecom/ultimateMRZ-SDK/blob/2b4ff4f8d058ed317964df5fa4469d7003901acc/samples/csharp/recognizer/Program.cs#L196

imranbaloch commented 3 years ago

I have changed my code as per your C# diff like,

            if (Image.GetPixelFormatSize(image.PixelFormat) == 24 && ((image.Width * 3) & 3) != 0)
            {
                image = new Bitmap(image, new Size((image.Width + 3) & -4, image.Height));
            }
            var result = CheckResult("Process", UltMrzSdkEngine.process(
                            ULTMRZ_SDK_IMAGE_TYPE.ULTMRZ_SDK_IMAGE_TYPE_BGR24,
                            imageData.Scan0,
                            (uint)image.Width,
                            (uint)image.Height
                        ));
    public enum ULTMRZ_SDK_IMAGE_TYPE
    {
        ULTMRZ_SDK_IMAGE_TYPE_RGB24,
        ULTMRZ_SDK_IMAGE_TYPE_RGBA32,
        ULTMRZ_SDK_IMAGE_TYPE_BGRA32,
        ULTMRZ_SDK_IMAGE_TYPE_NV12,
        ULTMRZ_SDK_IMAGE_TYPE_NV21,
        ULTMRZ_SDK_IMAGE_TYPE_YUV420P,
        ULTMRZ_SDK_IMAGE_TYPE_YVU420P,
        ULTMRZ_SDK_IMAGE_TYPE_YUV422P,
        ULTMRZ_SDK_IMAGE_TYPE_YUV444P,
        ULTMRZ_SDK_IMAGE_TYPE_Y,
        ULTMRZ_SDK_IMAGE_TYPE_BGR24
    }

Is it enough or I need to update ultimateMRZ-SDK.dll and other dlls?

DoubangoTelecom commented 3 years ago

Your code is not correct at all:

DoubangoTelecom commented 3 years ago

If you look at the enum you'll see that bgr24 is declared last, and not grouped with similar formats like rgb24. This is done on purpose, new formats are added at the end of the enum to avoid re-ordering old ones which will change their values. This reduce the risk when developers tries to new wrappers without the correct binaries.

Also, the api functions has never changed and will never change. You always have the same signatures only returned json change. Same as point 1, reduce ABI incompatibilities.

Not all programmers take these precautions.

imranbaloch commented 3 years ago

Thanks for your reply. So, in short, I need to update dlls and code together. Do I need to update train data files like mrz.classifier.params.json.doubango, mrz.classifier.strong.model.flat, mrz.classifier.strong.pca.json, and mrz.traineddata?

Also, it will be great if you suggestion for the below code,

public static class MrzReader
    {
        #region Configuration
        /**
         * Defines the debug level to output on the console. You should use "verbose" for diagnostic, "info" in development stage and "warn" on production.
         * JSON name: "debug_level"
         * Default: "info"
         * type: string
         * pattern: "verbose" | "info" | "warn" | "error" | "fatal"
         * More info: https://www.doubango.org/SDKs/mrz/docs/Configuration_options.html#debug-level
         */
        private const string CONFIG_DEBUG_LEVEL = "info";

        /**
         * Whether to write the transformed input image to the disk. This could be useful for debugging.
         * JSON name: "debug_write_input_image_enabled"
         * Default: false
         * type: bool
         * pattern: true | false
         * More info: https://www.doubango.org/SDKs/mrz/docs/Configuration_options.html#debug-write-input-image-enabled
         */
        private const bool CONFIG_DEBUG_WRITE_INPUT_IMAGE = false; // must be false unless you're debugging the code

        /**
        * Path to the folder where to write the transformed input image. Used only if "debug_write_input_image_enabled" is true.
        * JSON name: "debug_internal_data_path"
        * Default: ""
        * type: string
        * pattern: folder path
        * More info: https://www.doubango.org/SDKs/mrz/docs/Configuration_options.html#debug-internal-data-path
        */
        private const string CONFIG_DEBUG_DEBUG_INTERNAL_DATA_PATH = ".";

        /**
         * Defines the maximum number of threads to use.
         * You should not change this value unless you know what you’re doing. Set to -1 to let the SDK choose the right value.
         * The right value the SDK will choose will likely be equal to the number of virtual cores.
         * For example, on an octa-core device the maximum number of threads will be 8.
         * JSON name: "num_threads"
         * Default: -1
         * type: int
         * pattern: [-inf, +inf]
         * More info: https://www.doubango.org/SDKs/mrz/docs/Configuration_options.html#num-threads
         */
        private const int CONFIG_NUM_THREADS = -1;

        /**
         * Whether to enable GPGPU computing. This will enable or disable GPGPU computing on the computer vision and deep learning libraries.
         * On ARM devices this flag will be ignored when fixed-point (integer) math implementation exist for a well-defined function.
         * For example, this function will be disabled for the bilinear scaling as we have a fixed-point SIMD accelerated implementation.
         * Same for many deep learning parts as we’re using QINT8 quantized inference.
         * JSON name: "gpgpu_enabled"
         * Default: true
         * type: bool
         * pattern: true | false
         * More info: https://www.doubango.org/SDKs/mrz/docs/Configuration_options.html#gpgpu-enabled
         */
        private const bool CONFIG_GPGPU_ENABLED = true;

        /**
         * A device contains a CPU and a GPU. Both can be used for math operations.
         * This option allows using both units. On some devices the CPU is faster and on other it's slower.
         * When the application starts, the work (math operations to perform) is equally divided: 50% for the CPU and 50% for the GPU.
         * Our code contains a profiler to determine which unit is faster and how fast (percentage) it is. The profiler will change how
         * the work is divided based on the time each unit takes to complete. This is why this configuration entry is named "workload balancing".
         * JSON name: "gpgpu_workload_balancing_enabled"
         * Default: false for x86 and true for ARM
         * type: bool
         * pattern: true | false
         * More info: https://www.doubango.org/SDKs/mrz/docs/Configuration_options.html#gpgpu-workload-balancing-enabled
         */
        private const bool CONFIG_GPGPU_WORKLOAD_BALANCING_ENABLED = false;

        /**
         * Before calling the classifier to determine whether a zone contains a MRZ line we need to segment the text using multi-layer segmenter followed by clustering.
         * The multi-layer segmenter uses hysteresis for the voting process using a [min, max] double thresholding values. This configuration entry defines how low the
         * thresholding values should be. Lower the values are, higher the number of fragments will be and higher the recall will be. High number of fragments means more
         * data to process which means more CPU usage and higher processing time.
         * JSON name: "segmenter_accuracy"
         * Default: high
         * type: string
         * pattern: "veryhigh" | "high" | "medium" | "low" | "verylow"
         * More info: https://www.doubango.org/SDKs/mrz/docs/Configuration_options.html#segmenter-accuracy
         */
        private const string CONFIG_SEGMENTER_ACCURACY = "high";

        /**
         * Defines the interpolation method to use when pixels are scaled, deskewed or deslanted. bicubic offers the best quality but is slow as there
         * is no SIMD or GPU acceleration yet. bilinear and nearest interpolations are multithreaded and SIMD accelerated. For most scenarios bilinear
         * interpolation is good enough to provide high accuracy/precision results while the code still runs very fast.
         * JSON name: "interpolation"
         * Default: bilinear
         * type: string
         * pattern: "nearest" | "bilinear" | "bicubic"
         * More info: https://www.doubango.org/SDKs/mrz/docs/Configuration_options.html#interpolation
         */
        private const string CONFIG_INTERPOLATION = "bilinear";

        /**
         * Defines the minimum number of MRZ lines needed to form a valid zone. For example, this value must be 2 for passports (TD3 format) and visas (MRVA and MRVB formats).
         * JSON name: "min_num_lines"
         * Default: 2
         * type: int
         * pattern: [1, inf]
         * More info: https://www.doubango.org/SDKs/mrz/docs/Configuration_options.html#min-num-lines
         */
        private const int CONFIG_MIN_NUM_LINES = 2;

        /**
         * Defines the Region Of Interest (ROI) for the detector. Any pixels outside region of interest will be ignored by the detector.
         * Defining an WxH region of interest instead of resizing the image at WxH is very important as you'll keep the same quality when you define a ROI while you'll lose in quality when using the later.
         * JSON name: "roi"
         * Default: [0.f, 0.f, 0.f, 0.f]
         * type: float[4]
         * pattern: [left, right, top, bottom]
         * More info: https://www.doubango.org/SDKs/mrz/docs/Configuration_options.html#roi
         */
        static readonly IList<float> _cONFIG_ROI = new[] { 0f, 0f, 0f, 0f };

        /**
         * Defines a threshold for the recognition score/confidence. Any recognition with a score below that threshold will be ignored/removed.
         * This value could be used to filter the false-positives and improve the precision. Low value will lead to high recall and low precision
         * while a high value means the opposite.
         * Default: 0
         * type: float
         * pattern: [0.f, 1.f]
         * More info: https://www.doubango.org/SDKs/mrz/docs/Configuration_options.html#min-score
         */
        private const double CONFIG_MIN_SCORE = 0.0; // 0%
        #endregion

        static Semaphore _semaphore = new Semaphore(4, 4);

        public static bool IsInitialized { get; private set; }

        static MrzReader()
        {
            try
            {
                var tokenDataBase64 = AppSettings.MrzReaderLicence;
                var assetsPath = Helper.ApplicationRoot + "MrzAssetsPath";
                var configJson = BuildJSON(assetsPath, tokenDataBase64);
                CheckResult("Init", UltMrzSdkEngine.init(configJson));
                AppDomain.CurrentDomain.ProcessExit += (object s, EventArgs e) =>
                {
                    CheckResult("DeInit", UltMrzSdkEngine.deInit());
                };
                IsInitialized = true;
            }
            catch (Exception ex)
            {
                Logger.Log(ex);
            }
        }

        public static DocumentData ReadMrz(string base64Image)
        {
            if (!IsInitialized)
            {
                return null;
            }
            var imgBytes = Convert.FromBase64String(base64Image);
            var image = new Bitmap(new MemoryStream(imgBytes));
            if (Image.GetPixelFormatSize(image.PixelFormat) == 24 && ((image.Width * 3) & 3) != 0)
            {
                image = new Bitmap(image, new Size((image.Width + 3) & -4, image.Height));
            }
            int bytesPerPixel = Image.GetPixelFormatSize(image.PixelFormat) >> 3;
            if (bytesPerPixel != 1 && bytesPerPixel != 3 && bytesPerPixel != 4)
            {
                throw new Exception("Invalid BPP:" + bytesPerPixel);
            }
            const int ExifOrientationTagId = 0x112;
            int orientation = 1;
            if (Array.IndexOf(image.PropertyIdList, ExifOrientationTagId) > -1)
            {
                int orientation_ = image.GetPropertyItem(ExifOrientationTagId).Value[0];
                if (orientation_ >= 1 && orientation_ <= 8)
                {
                    orientation = orientation_;
                }
            }
            var imageData = image.LockBits(new Rectangle(0, 0, image.Width, image.Height), ImageLockMode.ReadOnly, image.PixelFormat);
            try
            {
                _semaphore.WaitOne();
                var result = CheckResult("Process", UltMrzSdkEngine.process(
                         (bytesPerPixel == 1) ? ULTMRZ_SDK_IMAGE_TYPE.ULTMRZ_SDK_IMAGE_TYPE_Y : (bytesPerPixel == 4 ? ULTMRZ_SDK_IMAGE_TYPE.ULTMRZ_SDK_IMAGE_TYPE_BGRA32 : ULTMRZ_SDK_IMAGE_TYPE.ULTMRZ_SDK_IMAGE_TYPE_BGR24),
                        imageData.Scan0,
                        (uint)imageData.Width,
                        (uint)imageData.Height,
                        (uint)(imageData.Stride / bytesPerPixel),
                        orientation
                    ));
                var json = result.json();
                var resultJson = JsonConvert.DeserializeObject<DocumentData>(json);
                _semaphore.Release();
                return resultJson;
            }
            finally
            {
                image.UnlockBits(imageData);
            }
        }

        private static UltMrzSdkResult CheckResult(string functionName, UltMrzSdkResult result)
        {
            if (!result.isOK())
            {
                var errMessage = string.Format("{0}: Execution failed: {1}", new string[] { functionName, result.json() });
                Logger.Log(errMessage);
                throw new Exception(errMessage);
            }
            return result;
        }

        private static string BuildJSON(string assetsFolder = "", string tokenDataBase64 = "")
        {
            // https://www.doubango.org/SDKs/mrz/docs/Configuration_options.html
            return new JavaScriptSerializer().Serialize(new
            {
                debug_level = CONFIG_DEBUG_LEVEL,
                debug_write_input_image_enabled = CONFIG_DEBUG_WRITE_INPUT_IMAGE,
                debug_internal_data_path = CONFIG_DEBUG_DEBUG_INTERNAL_DATA_PATH,

                num_threads = CONFIG_NUM_THREADS,
                gpgpu_enabled = CONFIG_GPGPU_ENABLED,
                gpgpu_workload_balancing_enabled = CONFIG_GPGPU_WORKLOAD_BALANCING_ENABLED,

                segmenter_accuracy = CONFIG_SEGMENTER_ACCURACY,
                interpolation = CONFIG_INTERPOLATION,
                min_num_lines = CONFIG_MIN_NUM_LINES,
                roi = _cONFIG_ROI,
                min_score = CONFIG_MIN_SCORE,

                // Value added using command line args
                assets_folder = assetsFolder,
                license_token_data = tokenDataBase64,
            });
        }
    }
DoubangoTelecom commented 3 years ago

If you don't want to update the binaries then, keep this code and change bgr24 and bgra32 to rgb24 and rgba32 if bgrxx doesn't exist in your local version.

This said, use mutex instead of semaphore, it'll be faster and doesn't make sense to use semaphore in this case. Please note that some semaphores use mutexes under the hood to protect concurrent access.

Another remark, the goal is to lock/unlock the shortest portion of code as possible. In your code you're using a semaphore (should be mutex) to protect the process function and the deserialization of the result. There is no reason to include "JsonConvert.DeserializeObject(json)" in the protected block. This wi make your code slower. No reason for DeserializeObject function not to be thread-safe.

DoubangoTelecom commented 3 years ago

Another remark, if the process function raise an exception then you'll move in the finally block without releasing your semaphore. You'll be in big trouble as the code execution will end up in deadlock. Use AutoLocks to ease your life.

imranbaloch commented 3 years ago

If you don't want to update the binaries then, keep this code and change bgr24 and bgra32 to rgb24 and rgba32 if bgrxx doesn't exist in your local version.

I can update the dll no issue? The question is that do need to update train data files like mrz.classifier.params.json.doubango, mrz.classifier.strong.model.flat, mrz.classifier.strong.pca.json, and mrz.traineddata? Also, what do you mean by bgrxx in my local machine?

This said, use mutex instead of semaphore, it'll be faster and doesn't make sense to use semaphore in this case. Please note that some semaphores use mutexes under the hood to protect concurrent access.

As it is a web application, I wanna allow 4-5 users to run the exclusive section at the same time while mutex only allows one thread to enter the critical section.

Another remark, the goal is to lock/unlock the shortest portion of code as possible. In your code you're using a semaphore (should be mutex) to protect the process function and the deserialization of the result. There is no reason to include "JsonConvert.DeserializeObject(json)" in the protected block. This wi make your code slower. No reason for DeserializeObject function not to be thread-safe.

Another remark, if the process function raise an exception then you'll move in the finally block without releasing your semaphore. You'll be in big trouble as the code execution will end up in deadlock. Use AutoLocks to ease your life.

Thanks, for very good suggestions.

DoubangoTelecom commented 3 years ago

Running the process function in parallel isn't recommended: https://www.doubango.org/SDKs/mrz/docs/Architecture_overview.html#thread-safety

imranbaloch commented 3 years ago

Running the process function in parallel isn't recommended: https://www.doubango.org/SDKs/mrz/docs/Architecture_overview.html#thread-safety

Ok you mean there should be always not more than 1 thread enters into the critical section.

DoubangoTelecom commented 3 years ago

Yes. Unlike a GPU, a CPU will run slower when you overload it. Just like when you have a CPU intensive app running on background, every other program will be slow while the background app not gain much. Same principle. You can chech CompV project for more info. It's up to 50 times faster than OpenCV:https://github.com/DoubangoTelecom/compv/tree/master/base/parallel

The files in models folder not tied to the binaries. Up to you if you want to update. I'll recommend checking the commit logs, if they have change it means the new one are more accurate than what you have.