Closed danp11 closed 4 years ago
Hi again, doing some weekend coding and have done quite alot of changes. Ill close this PR and open a new one once I have the code more structured. /Dan
@danp11 - Sounds great, I'll be looking out for the new pull request. Looking forward to checking out the code!
@danp11 - just took a look at the code and looks like you're off to a great start!!
Hi again
I have done a first "best effort" :-) of a generic delta lake parser. Let me know if you think it could be something of interest for others and I'll send a PR to your delta lake examples repo. Enjoy your weekend! /Dan
@danp11 - Yea, this would make a great pull request. Hopefully we can collaborate on a blog post after working out all the code details!
Probably better to use compactFiles
instead of compressFiles
. Compress typically refers to the file compression format (snappy). We'll be able to add some more tests once the code is in the delta-examples repo. We'll also be able to add some additional examples for some of the other advanced stuff Dominique covered in his talk (I still don't understand a lot of that yet).
Keep up the great work!
@MrPowers
Ok, nice. I'll be adding more test and will go thru the code in more detail and see if there is more that can be abstracted away. In a week or two I should have it ready for a PR.
Hi Matthew
I red in your book some examples about Medellin :-) I guess you have lived there? I also did in 2003-2004. I went thru there with my motorbike and loved the people, the city and the country so much that I stayed there almost a year. When I get close to my retirement in 25 years or so I'm moving back :-)
Being a newbie in Scala/Spark I I find it a bit hard how to organize the code. I've taken almost all the code from Dominique parser and added some code that hopefully can make it even more generic.
In this first PR I just want to ask if you have some time over and have a quick look and see if there is something I can work more with to educate myself. It is no problem if you don't have the time or simply think it doesn't bring any value to the example repo haha. My hope however is that we can get some code that is easy to follow and maintain and mostly it should be easy to plug in new "event types" In this example one can easily plug in for example a "Order handler" that could be from another source than from a file.
There is alot of tests missing etc but I just want to get a first opinion from you.
With this current code there is no need for a "bronze table" and I might miss something but if feels a little overkill if you have all the data close to you and in known locations?
Hopefully the code should be easy to follow and any inputs from you of what to change/how to better structure it would be very appreciated. But no worries if you can't!
Take care,
/Dan