SYSTEMS-OPERATOR / T.T.M.A.T.G.R.A.L.R.W.R.P

that time mind and tonic got reincarnated as low res waifu research program
The Unlicense
4 stars 0 forks source link

Data Collection Pipeline (cbr-webui.py) #4

Open P2PayPeer opened 8 months ago

P2PayPeer commented 8 months ago

cbr-webui.py

flowchart TD
    A[Start] --> B{CBR File Uploaded}
    B -->|Yes| C[Extract Pages from CBR]
    B -->|No| Z[End Process]
    C --> D[Display First Page in Gradio UI]
    D --> E[User Views Page]
    E -->|OCR Option| F[Apply OCR to Page]
    F --> G[OCR Text Displayed]
    E -->|Manual Entry| H[User Enters Structured Phrases]
    G --> I[User Edits/Confirms OCR Text]
    H --> J[User Enters Character Names and IDs]
    I --> J
    J --> K[User Submits Processed Page]
    K --> L{More Pages to Process?}
    L -->|Yes| D
    L -->|No| M[Compile All Processed Data]
    M --> N[Save Compiled Data]
    N --> O[Provide Option to Review/Edit Saved Data]
    O --> P{Review/Edit?}
    P -->|Yes| Q[User Reviews/Edits Data]
    Q --> N
    P -->|No| Z
    Z --> X[Finalize and Export Data]

pipeline cat

Mind-Interfaces commented 8 months ago

NER (Named Entity Recognition)

flowchart TD
    A[Start OCR Process] --> B[Receive Image Page]
    B --> C[Preprocess Image]
    C --> D[Apply OCR to Image]
    D --> E[Extracted Text]
    E --> F[Split Text into Sentences/Phrases]
    F --> G[Present Text to User for Initial Review]
    G --> H{User Edits?}
    H -->|Yes| I[User Edits Text]
    I --> J[Finalized Text from User]
    H -->|No| J
    J --> K[Apply NER Model]
    K --> L[Identify Entities]
    L --> M[Tag Entities with IDs]
    M --> N[Present Tagged Text to User]
    N --> O{User Edits Tags?}
    O -->|Yes| P[User Edits Entity Tags]
    P --> Q[Updated Tagged Text]
    O -->|No| Q
    Q --> R[Store Tagged Text in Data Structure]
    R --> S{More Pages/Text?}
    S -->|Yes| B
    S -->|No| T[End OCR & NER Process]
    T --> U[Proceed to Next Step in Data Pipeline]

pipeline cat