best-practice-and-impact / ons-spark

MIT License
9 stars 5 forks source link

Optimising joins parquet #129

Closed NathanKelly-ONS closed 10 months ago

NathanKelly-ONS commented 11 months ago

Merge request template: please remove the appropriate parts of this template.

Pre-merge request checklist (to be completed by the one making the request):

Details of this request (such as): Modified the optimising joins chapter to read in a parquet file rather than a CSV file. Also updated the Spark UI screenshots.

Requirements for review (such as):

NathanKelly-ONS commented 11 months ago

@alexsnowdon I made your suggested changes, do you mind giving them a review and merging them in when you get a chance :)

alexsnowdon commented 11 months ago

@NathanKelly-ONS I can see you have made the suggested changes, unfortunately when I build the book figure 18 and figure 21 are now broken. For figure 18 it seems that in the .md file and in the .ipynb the figure is referenced differently. In the .md a 'soft_merge_join_ui_new.png' is referenced, whereas in the images folder and in the .ipynb this is just 'sort_merge_join_ui.png'! For figure 21, I cannot see what could be causing this as in all files it is referenced as when_ui.png.

NathanKelly-ONS commented 11 months ago

@alexsnowdon finally managed to get it to work (I think!). It's working on my laptop and I've built and checked that all images are displaying, and they seem to be. Hopefully it's all good on your end too :)