Open Greenheart opened 3 weeks ago
Good idea, Zod should probably be (also) implemented at the extractJson step in the data-pipeline instead (so we can iterate if we get an error).
We could use https://transform.tools/json-to-zod to get a starting point for Zod schemas based on JSON data. This will need to be modified to handle all cases, but it's a start:
const companySchema = z.object({
companyName: z.string(),
description: z.string(),
industryGics: z.object({
name: z.string(),
sector: z.object({ code: z.string(), name: z.string() }),
group: z.object({ code: z.string(), name: z.string() }),
industry: z.object({ code: z.string(), name: z.string() }),
subIndustry: z.object({ code: z.string(), name: z.string() }),
}),
industryNace: z.object({
section: z.object({ code: z.string(), name: z.string() }),
division: z.object({ code: z.string(), name: z.string() }),
}),
baseYear: z.string(),
url: z.string(),
emissions: z.object({
2023: z.object({
year: z.string(),
scope1: z.object({
emissions: z.number(),
biogenic: z.null(),
unit: z.string(),
}),
scope2: z.object({
emissions: z.number(),
unit: z.string(),
mb: z.number(),
lb: z.number(),
}),
scope3: z.object({
emissions: z.number(),
unit: z.string(),
categories: z.object({
'1_purchasedGoods': z.null(),
'2_capitalGoods': z.null(),
'3_fuelAndEnergyRelatedActivities': z.number(),
'4_upstreamTransportationAndDistribution': z.null(),
'5_wasteGeneratedInOperations': z.null(),
'6_businessTravel': z.null(),
'7_employeeCommuting': z.null(),
'8_upstreamLeasedAssets': z.null(),
'9_downstreamTransportationAndDistribution': z.null(),
'10_processingOfSoldProducts': z.null(),
'11_useOfSoldProducts': z.null(),
'12_endOfLifeTreatmentOfSoldProducts': z.null(),
'13_downstreamLeasedAssets': z.null(),
'14_franchises': z.null(),
'15_investments': z.null(),
'16_other': z.null(),
}),
}),
totalEmissions: z.number(),
totalUnit: z.string(),
}),
}),
baseFacts: z.object({
2023: z.object({
turnover: z.number(),
unit: z.string(),
employees: z.number(),
}),
}),
factors: z.array(
z.object({
product: z.string(),
description: z.string(),
value: z.number(),
unit: z.string(),
}),
),
contacts: z.array(z.unknown()),
goals: z.array(
z.object({
description: z.string(),
year: z.string(),
reductionPercent: z.number(),
baseYear: z.string(),
}),
),
initiatives: z.array(
z.object({
title: z.string(),
description: z.string(),
year: z.string(),
scope: z.string(),
}),
),
reliability: z.string(),
needsReview: z.boolean(),
reviewComment: z.string(),
confidenceScore: z.number(),
agentResponse: z.string(),
id: z.string(),
})
This type guard could be replaced with
zod
to get types and validation based on one single source of truth = azod
schema.https://github.com/Klimatbyran/beta/blob/0faa3f73c8baffefea746342566408bfdb8e6b23/src/data/companyData.ts#L123-L144