Klimatbyran / beta

New repo for new layout
2 stars 3 forks source link

Consider using zod to parse API responses instead of type casting #3

Open Greenheart opened 3 weeks ago

Greenheart commented 3 weeks ago

This type guard could be replaced with zod to get types and validation based on one single source of truth = a zod schema.

https://github.com/Klimatbyran/beta/blob/0faa3f73c8baffefea746342566408bfdb8e6b23/src/data/companyData.ts#L123-L144

irony commented 3 weeks ago

Good idea, Zod should probably be (also) implemented at the extractJson step in the data-pipeline instead (so we can iterate if we get an error).

Greenheart commented 3 weeks ago

We could use https://transform.tools/json-to-zod to get a starting point for Zod schemas based on JSON data. This will need to be modified to handle all cases, but it's a start:

const companySchema = z.object({
  companyName: z.string(),
  description: z.string(),
  industryGics: z.object({
    name: z.string(),
    sector: z.object({ code: z.string(), name: z.string() }),
    group: z.object({ code: z.string(), name: z.string() }),
    industry: z.object({ code: z.string(), name: z.string() }),
    subIndustry: z.object({ code: z.string(), name: z.string() }),
  }),
  industryNace: z.object({
    section: z.object({ code: z.string(), name: z.string() }),
    division: z.object({ code: z.string(), name: z.string() }),
  }),
  baseYear: z.string(),
  url: z.string(),
  emissions: z.object({
    2023: z.object({
      year: z.string(),
      scope1: z.object({
        emissions: z.number(),
        biogenic: z.null(),
        unit: z.string(),
      }),
      scope2: z.object({
        emissions: z.number(),
        unit: z.string(),
        mb: z.number(),
        lb: z.number(),
      }),
      scope3: z.object({
        emissions: z.number(),
        unit: z.string(),
        categories: z.object({
          '1_purchasedGoods': z.null(),
          '2_capitalGoods': z.null(),
          '3_fuelAndEnergyRelatedActivities': z.number(),
          '4_upstreamTransportationAndDistribution': z.null(),
          '5_wasteGeneratedInOperations': z.null(),
          '6_businessTravel': z.null(),
          '7_employeeCommuting': z.null(),
          '8_upstreamLeasedAssets': z.null(),
          '9_downstreamTransportationAndDistribution': z.null(),
          '10_processingOfSoldProducts': z.null(),
          '11_useOfSoldProducts': z.null(),
          '12_endOfLifeTreatmentOfSoldProducts': z.null(),
          '13_downstreamLeasedAssets': z.null(),
          '14_franchises': z.null(),
          '15_investments': z.null(),
          '16_other': z.null(),
        }),
      }),
      totalEmissions: z.number(),
      totalUnit: z.string(),
    }),
  }),
  baseFacts: z.object({
    2023: z.object({
      turnover: z.number(),
      unit: z.string(),
      employees: z.number(),
    }),
  }),
  factors: z.array(
    z.object({
      product: z.string(),
      description: z.string(),
      value: z.number(),
      unit: z.string(),
    }),
  ),
  contacts: z.array(z.unknown()),
  goals: z.array(
    z.object({
      description: z.string(),
      year: z.string(),
      reductionPercent: z.number(),
      baseYear: z.string(),
    }),
  ),
  initiatives: z.array(
    z.object({
      title: z.string(),
      description: z.string(),
      year: z.string(),
      scope: z.string(),
    }),
  ),
  reliability: z.string(),
  needsReview: z.boolean(),
  reviewComment: z.string(),
  confidenceScore: z.number(),
  agentResponse: z.string(),
  id: z.string(),
})