feathr-ai / feathr

Feathr – A scalable, unified data and AI engineering platform for enterprise
https://join.slack.com/t/feathrai/shared_invite/zt-1ffva5u6v-voq0Us7bbKAw873cEzHOSg
Apache License 2.0
1.97k stars 259 forks source link

[Hdfs Utils] add uri param to filesystem get api #1219

Open bheroder opened 1 year ago

bheroder commented 1 year ago

Description

when HDFS is configured with fs.defaultsFS pointing to blob storage, we are not able to give local path as source location in feature definition config. this is because hdfsutils.scala uses FileSystem.get(conf) API. instead using FileSystem.get(URI, conf) solves the problem where one can give source location as file:///xyz even though defaultFS points to blob.

Testing

manually tested using notebook session configured with fs.defaultFS= wasbs://xyz FileSystem.get(conf).exists("file:///path") returns false FileSystem.get(URI, conf).exists("file:///path") returns true

Does this PR introduce any user-facing changes?

bheroder commented 1 year ago

cc @xiaoyongzhu

bheroder commented 1 year ago

@windoze could you help with tagging the right reviewers ?

bheroder commented 12 months ago

@jaymo001 @rakeshkashyap123 could you please review ? cc @windoze