emdeh / pdf-document-processor

0 stars 0 forks source link

Post-processing task - Append date range to filename #55

Open emdeh opened 4 days ago

emdeh commented 4 days ago

Description

Create a postprocessing task that adds the statement start date to the filenames of the split PDF files. The task should prefix the filenames within the account-specific subfolders with the date in YYYYMMDD format so that files can be ordered chronologically.

To-Do List Overview:

Implement Statement Start Date Extraction:

Add Task to Task Registry:

Format Dates Consistently:

Rename Files to Include Date Prefix:

Outline of New Code and Placement:

Add Date Extraction Method to PDFPostprocessor:

Implement the Task Function:

Update Task Registry:

Add add_date_prefix_to_filenames to the task_registry.


Other considerations

emdeh commented 4 days ago

Have outlined the following methods relating to appending the date range to filenames of split files (all in the PDFPostProcessor class in the postprocess_utils.py file.

Done under Post-processing task - Group split files by account number as the general outline as some will be used across tasks.

Other notes: We will need to potentially call existing methods in pdf_processor.py to read the pdf (whether it is machine readable or needs OCR)

emdeh commented 4 days ago

Once development on Post-processing task - Group split files by account number has created the shared methods, create a new development branch for this issue, making sure to create it from that Issue's branch (not main).

emdeh commented 3 days ago

Once development on Post-processing task - Group split files by account number has created the shared methods, create a new development branch for this issue, making sure to create it from that Issue's branch (not main).

Edit: I have created a base-feature-iteration2 branch and merged changes from this feature branch into it. This means the "feature" branches for this task and #56 can be created from the base-feature-iteration2 branch.

Then, we PR the features into that base branch, then the base branch into main