TresAmigosSD / SMV

Spark Modularized View
Apache License 2.0
42 stars 22 forks source link

Need a get_content_list method for connections #1576

Closed ninjapapa closed 5 years ago

ninjapapa commented 5 years ago

Client code:

my_conn = smvApp.get_connection_by_name('my_hdfs')
csv_files = my_conn.get_content_list(pattern=r".*\.csv", ignore_case=True)
ninjapapa commented 5 years ago

Since get_content_list have to use spark context, if might be better to make it a SmvApp method.

AliTajeldin commented 5 years ago
ninjapapa commented 5 years ago
ninjapapa commented 5 years ago

Or maybe pattern should be an attribute of that connection....

AliTajeldin commented 5 years ago

probably should have said "might not have" instead of "will not have". Hive will have pattern but other future connection types may not have that concept.

AliTajeldin commented 5 years ago

pattern should probably stay with the get_content_list. cleaner than splitting the content filter criterion between two places (in constructor and in getting content list). Also, we may want to use pattern to filter out large connection lists as user type in (rather than our usual case of getting all items and filtering the in-memory list)

ninjapapa commented 5 years ago

Will get everything without the pattern match.

ninjapapa commented 5 years ago

Fo JDBC, the ways to get a list of tables are different for different DBs.