goldshtn / spark-workshop

Labs and data files for a full-day Spark workshop
MIT License
24 stars 23 forks source link

Scala Labs: 0, 1, 2, 4 #8

Closed noikaslev closed 8 years ago

noikaslev commented 8 years ago

I modified the prop-prices.csv file. I added seconds in the date field

goldshtn commented 8 years ago

Thanks! Not sure why you had to modify the CSV file. I would much rather to leave the original dataset intact. If any preprocessing is necessary it needs to be done on the Scala side. Can you make that work?

noikaslev commented 8 years ago

I refactored the Lab4. Changed the date field to string type and using substring function to get year/month.

goldshtn commented 8 years ago

Did you push your changes back to this branch? I still see the .csv file as being modified as part of this PR.

noikaslev commented 8 years ago

Not yet, I want to add Lab5 (Twitter)

goldshtn commented 8 years ago

@noikaslev If you delete the .csv file from your local repo and push it to the fork, you'll be able to sync this PR and get the delete commit here. But even better would be to rebase on your side so that the .csv file doesn't even enter history. Let me (or @eyalbenivri) know over email if you need help with this.

goldshtn commented 8 years ago

Lev, I don't really want one huge PR that has all the labs in it. Please update what the status is on the different labs and the comments that were raised on this PR earlier. If all the labs are ok, I would really like to merge the ones you have ready and then do smaller PRs going forward. @noikaslev

noikaslev commented 8 years ago

I updated what we discussed on the first PR, and added Lab 3

goldshtn commented 8 years ago

@noikaslev: I am merging this, thanks. For the next labs, please open a separate PR for each lab. Also, when you're done, please review the labs you already ported for grammar and punctuation -- there are some typos and incomplete sentences. Thanks!