Klimatbyran / garbo

Klimatkollen's data pipeline, processing company sustainability reports
3 stars 6 forks source link

Need to link multiple pdfs to the same company #110

Open Almendra-ab opened 4 months ago

Almendra-ab commented 4 months ago

We need to find a way to link multiple reports to the same company. There are two situations that can have this need, one is that the data is not all presented in the same pdf (example: banks such as Handelsbanken and Arion have a separate report for their financed emissions), and the other case is when we have reports from different years. This can be taking data from the 2022 report to get more historical data, or when we will add the 2024 report in the future.

I can see how this could be complex as we in the case with Arion, have financial data from 2021 that is updated in the latest report. But the earlier report has data for 2019, which is not included in the latest report. Basically the prompt would be to take all the data available from the latest report, and use older reports to fill in the gaps. A new report would override the old where there is data, but leave the old data where no new is found.