benchflow / data-transformers

Spark scripts utilised to transform data to the BenchFlow internal formats
Other
0 stars 0 forks source link

Develop a library that abstracts common functionalities of data-transformers #5

Closed VincenzoFerme closed 8 years ago

VincenzoFerme commented 8 years ago

Develop a library that abstract common functionalities of data transformers so that the code needed for the specific data-transformers contains only the logic needed to accomplish a specific task. The library should completely hide the common functionalities, it means we should have the functionalities without the need of writing any code in the custom data-transformers (e.g., using classes)

To be moved into the library:

Start from the following code written in Java: Cleaner.zip. It is described on the following document: Marco_Argenti_thesis.pdf

Take into account:

VincenzoFerme commented 8 years ago

@Cerfoglg first thing to do is completing the list of functionalities that can be moved to the library, and do a quick sketch of the library design.

VincenzoFerme commented 8 years ago

Can still be improved. As for example we can also move:

def getFromMinio(url):
       from commons import getFromUrl
       return getFromUrl(url)