Created DatasetPopularity.py

dmwm / DDM

Dynamic Data Management - Cache release and auto-replication of hot data

1 stars 9 forks source link

Created DatasetPopularity.py #112

Closed meniluca closed 7 years ago

meniluca commented 7 years ago

Python version of DatasetPopularity.scala

vkuznet commented 7 years ago

The PR looks fine to me, I'll suggest that we merge this.

cvuosalo commented 7 years ago

Hi Luca. Could you please add information to the README.md file? You told me this PySpark script matches Marco's Scala script that performs HDFS aggregation similar to Oracle aggregation, and it is intended as an example of how to convert Scala to PySpark. Could please add a statement like this one to the README.md, along with a summary of what the script does and its inputs and outputs? Thanks.

meniluca commented 7 years ago

Done 👍