FullFact / health-misinfo-shared

Raphael health misinformation project, shared by Full Fact and Google
MIT License
0 stars 0 forks source link

Database schema revisions #71

Closed andylolz closed 2 months ago

andylolz commented 2 months ago

The current database schema (https://github.com/FullFact/health-misinfo-shared/issues/62#issuecomment-2118785064) has a few gaps. We’d like to store the following things:

Here’s a proposed revised schema. ~This is WIP, and it may also be more complicated than we really need. But hopefully it should capture the things mentioned above.~ UPDATE: @dcorney, @ff-dh, @JamesMcMinn and @andylolz discussed and agreed the following:

erDiagram
  youtube_videos ||--o{ claim_extraction_runs : runs
  youtube_videos {
    text id
    text metadata
    text transcript
  }

  claim_extraction_runs ||--o{ inferred_claims : claims
  claim_extraction_runs {
    integer id PK
    text youtube_id FK
    text model
    text status
    integer timestamp
  }

  inferred_claims {
    integer id PK
    integer run_id FK
    text claim
    text raw_sentence_text
    text labels
    real offset_start_s
    real offset_end_s
  }

  training_claims {
    integer id PK
    text youtube_id
    text claim
    text labels
  }
dcorney commented 2 months ago