atbasu / document-content-extractor

Python program that uses open ai apis to parse user specified content from text files
0 stars 0 forks source link

some fields are being excluded during extraction #14

Closed atbasu closed 1 year ago

atbasu commented 1 year ago

E.g. the fields "pocPhone", "pocEmail" aren't being extracted even when the reqiured field is set to True

atbasu commented 1 year ago

2023-07-03 13:53:23,489 - DEBUG - Parser config as read from file: {'borrowerName_1': {'type': 'string', 'required': True, 'description': '1st borrower/signer name, not vendor/agent.'}, 'borrowerEmail_1': {'type': 'string', 'required': True, 'description': 'email of 1st borrower'}, 'borrowerCellPhone_1': {'type': 'string', 'required': True, 'description': 'cell number of 1st borrower'}, 'borrowerName_2': {'type': 'string', 'required': True, 'description': '2nd borrower/signer name, not vendor/agent.'}, 'borrowerEmail_2': {'type': 'string', 'required': True, 'description': 'email of 2nd borrower'}, 'borrowerCellPhone_2': {'type': 'string', 'required': True, 'description': 'cell number of 2nd borrower'}, 'IsTheSignerAForeignNational?': {'type': 'boolean', 'required': True, 'description': 'are borrowers foreign nationals'}, 'Language': {'type': 'string', 'required': True, 'description': 'Language chosen by the signer'}, 'Timezone': {'type': 'string', 'required': True, 'description': 'The timezone of the signer'}, 'propertyAddress_Line': {'type': 'string', 'required': True, 'description': 'street address of loan property'}, 'propertyAddress_City': {'type': 'string', 'required': True, 'description': 'property address city'}, 'propertyAddress_State': {'type': 'string', 'required': True, 'description': 'property address state'}, 'propertyAddress_Zip': {'type': 'string', 'required': True, 'description': 'property address zip code'}, 'closingAddress_Line': {'type': 'string', 'required': True, 'description': 'street address for closing confirmation signature'}, 'closingAddress_City': {'type': 'string', 'required': True, 'description': 'closingAddress city'}, 'closingAddress_State': {'type': 'string', 'required': True, 'description': 'closingAddress state'}, 'closingAddress_Zip': {'type': 'string', 'required': True, 'description': 'closingAddress zip code'}, 'appointmentDateTime': {'type': 'string', 'required': True, 'description': 'date and time of appointment for closing/signing'}, 'FileNumber': {'type': 'string', 'required': True, 'description': 'associated file numbers'}, 'OrderOnBehalfOf': {'type': 'string', 'required': True, 'description': 'name of person on whose behalf this is signed'}, 'SigningType': {'type': 'string', 'required': True, 'description': 'signing method'}, 'closingType': {'type': 'string', 'required': True, 'description': 'loan product type'}, 'lender': {'type': 'string', 'required': True, 'description': 'name of lender/loan company'}, 'companyFee': {'type': 'string', 'required': True, 'description': 'dollar fee charged by the lender'}, 'AgentName': {'type': 'string', 'required': True, 'description': 'name of responsible agent/vendor'}, 'AgentFee': {'type': 'string', 'required': True, 'description': 'dollar fee charged by the agent, if applicable.'}, 'WitnessNumber': {'type': 'string', 'required': True, 'description': 'number of witnesses for signing'}, 'UploadFiles': {'type': 'boolean', 'required': True, 'description': 'any files uploaded'}, 'InternalNotes': {'type': 'string', 'required': False, 'description': 'internal notes or instructions directed at the vendor/agent and not intended for the signer/borrower, maybe present as comments'}, 'ExternalNotes': {'type': 'string', 'required': False, 'description': 'external notes or instructions directed at the borrower or signer, maybe present as comments.'}, 'InstructionType': {'type': 'string', 'required': False, 'description': 'the type of instructions added'}, 'isScanBackNeeded': {'type': 'string', 'required': True, 'description': 'whether any scan backs are needed'}, 'pocName': {'type': 'string', 'required': True, 'description': 'name of point of contact(poc)'}, 'pocPhone': {'type': 'string', 'required': True, 'description': 'phone number of poc'}, 'pocEmail': {'type': 'string', 'required': True, 'description': 'email of poc'}} 2023-07-03 13:53:23,491 - DEBUG - Chunk size based on 32 fields and 3 splits = 10

The problem here is the chunk size is being calculated incorrectly by the code:

    num_fields = len(fields)
    splits = min(splits, num_fields)
    chunk_size = num_fields // splits

It uses integer division (//) to calculate the chunk size. Integer division truncates the decimal part and returns the quotient without rounding up.

atbasu commented 1 year ago

Fixed it by updating the code that calculates chunk size