OpenPecha / Toolkit

🛠 Tools to create, edit and export texts and annotations
https://toolkit.openpecha.org
Apache License 2.0
7 stars 4 forks source link

GoogleVisionFormatter return OpenPechaFS's pecha object with incorrect is_private value #239

Open ta4tsering opened 1 year ago

ta4tsering commented 1 year ago

Describe the bug When I use the recently updated GoogleVisionFormatter class to create opf from OCR output, Even when the work_id's CopyRight status is Public domain, the is_private key's value in the pecha object of the OpenpechaFS return is True and the published opf is private when it should be public.

To Reproduce Steps to reproduce the behavior:

  1. use the below script

    
     from openpecha.formatters.ocr.google_vision import GoogleVisionFormatter, GoogleVisionBDRCFileProvider
     from openpecha.core.pecha import OpenPechaGitRepo
     from openpecha.core.ids import get_initial_pecha_id
    
     def make_opf(ocr_import_info, ocr_path):
      work_id = "W3CN18530"
      data_provider = GoogleVisionBDRCFileProvider(bdrc_scan_id=work_id, ocr_import_info=ocr_import_info, 
     ocr_disk_path=ocr_path)
      pecha_id = get_initial_pecha_id()
      formatter = GoogleVisionFormatter(f"./pechas/{pecha_id}/{pecha_id}.opf")
      pecha = formatter.create_opf(data_provider, pecha_id, {}, ocr_import_info)
      pecha.__class__ = OpenPechaGitRepo
      pecha.storage = None
      pecha.meta.id = pecha.pecha_id
      pecha.save_meta()
      pecha.publish(asset_path=ocr_path, asset_name="ocr_output")
    
    if __name__ == "__main__":
      ocr_import_info = {
        "source": "bdrc",
        "software": "vision",
        "batch": "batch-G8E3G",
        "expected_default_language": "bo",
        "bdrc_scan_id": "W3CN18530",
        "ocr_info": {
            "timestamp": "2023-01-20T17:42:00",
            "imagesfolder": "images"
          }
        }
      ocr_path = Path(f"./ocrs/W3CN18530")
      pecha = make_opf(ocr_import_info, ocr_path)```
  2. Below link is the OCR example of the OCR output used. OCR output of W3CN18530

Expected behavior the return of OpenpechaFS pecha object's is_private should be false

Screenshots below screenshot image is what the GoogleVisionFormatter returns Screenshot 2023-01-24 at 9 49 43 AM

Desktop (please complete the following information): Openpecha toolkit

Additional context None