Open AnneSchoenauer opened 10 months ago
Hey @SKruthoff would be great if you could give us an update here. Thanks! cc' @AnneSchoenauer
Hi,
I have started on the checks. So far there seems to be a lot of overlap between the two. I will compile a more detailed overview over the next couple of days.
The part where there is a bigger discrepancy is the upstream profiles. Kalash right now is working on removing a column that seems to be the cause of the duplication that should address this issue. Further than that I have found that there are some differences in the matching company average between the results of Mauro and me.
Thanks for the update! Looking forward to the overview!
@Tilmon and I reviewed the output table from Sven which can be found here. We discussed before Christmas that it would be great if you @SKruthoff check if your output tables are the same as Mauro's output tables.
I think one different for sure will be that I never heard about row_id in our output files. Most likely something that came out when doing the process with Databricks? Anyways, curious to see what comes out of the analysis.
Best Anne