ExpediaGroup / waggle-dance

Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.
Apache License 2.0
261 stars 71 forks source link

Can we implement a MetaStoreFilterHook to modify the path and support HMS that uses the same ns #320

Open flaming-archer opened 1 week ago

flaming-archer commented 1 week ago

Is your feature request related to a problem? Please describe.

An HMS uses a Hadoop cluster, assuming its path name is hdfs://ns/path/to/hms1 . Another HMS uses another Hadoop cluster, assuming its path name is hdfs://ns/path/to/hms2 .In this case, if hs2 obtains two identical ns paths, it is unclear which Hadoop cluster to access. Assuming we have RBF connecting these two different Hadoop clusters, we can change these two paths to RBF paths. For example, hdfs://ns/path/to/hms1 become hdfs://rbf/ns1/path/to/hms1 , hdfs://ns/path/to/hms2 become hdfs://rbf/ns2/path/to/hms2 . The change of path can be modified based on a set of rules, and the above example is a simple rule. In this scenario, can we implement a MetaStoreFilterHook to achieve this goal.

Describe the solution you'd like

I see an example of PrefixingMetastoreFilter that can be used to modify a path. Perhaps we can refer to this to add a new class to achieve such functionality.

Describe alternatives you've considered

Perhaps there are other ways, such as changing the path of the existing HMS to the RBF path, but this may introduce some operational risks to the existing environment

Additional context No, this is a new feature.

patduin commented 1 week ago

Please have a look at this maybe this satisfies the usecase. https://github.com/ExpediaGroup/apiary-extensions/tree/main/hive-hooks

Either way WD should support loading such hooks already it's just a matter of loading it all up. The implementation of this hook can live in any project. I prefer to leave those out of WD.

flaming-archer commented 1 week ago

Please have a look at this maybe this satisfies the usecase. https://github.com/ExpediaGroup/apiary-extensions/tree/main/hive-hooks

Either way WD should support loading such hooks already it's just a matter of loading it all up that the implementation of this hook can live in any project. I prefer to leave those out of WD.

Thank you very much, this is very important to me, haha.

patduin commented 1 week ago

sure, have a look and if it's ok we can close this issue.

flaming-archer commented 1 week ago

sure, have a look and if it's ok we can close this issue.

I took a look and it doesn't seem to be a perfect match. ApiaryMetastoreFilter is simply replaced as a path according to rules. What if the paths of two HMS are exactly the same. It would be better to introduce the concept of db or hms as a replacement for path rules at this time. For example, the path of hms1 and hms2 are replaced with different paths. So is there a better way to deal with this situation,please do not hesitate to teach me...

patduin commented 1 week ago

Can't you add a hook in the metastores and return the path with namenode so they are unique again?

On Mon, 24 Jun 2024, 11:18 tian bao, @.***> wrote:

sure, have a look and if it's ok we can close this issue.

I took a look and it doesn't seem to be a perfect match. ApiaryMetastoreFilter is simply replaced as a path according to rules. What if the paths of two HMS are exactly the same. It would be better to introduce the concept of db or hms as a replacement for path rules at this time. For example, the path of hms1 and hms2 are replaced with different paths. So is there a better way to deal with this situation.

— Reply to this email directly, view it on GitHub https://github.com/ExpediaGroup/waggle-dance/issues/320#issuecomment-2186014616, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAP6JGCKPSQEMXYEKZSPMA3ZI7P6JAVCNFSM6AAAAABJZGCRY2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBWGAYTINRRGY . You are receiving this because you commented.Message ID: @.***>

flaming-archer commented 1 week ago

Can't you add a hook in the metastores and return the path with namenode so they are unique again? On Mon, 24 Jun 2024, 11:18 tian bao, @.> wrote: sure, have a look and if it's ok we can close this issue. I took a look and it doesn't seem to be a perfect match. ApiaryMetastoreFilter is simply replaced as a path according to rules. What if the paths of two HMS are exactly the same. It would be better to introduce the concept of db or hms as a replacement for path rules at this time. For example, the path of hms1 and hms2 are replaced with different paths. So is there a better way to deal with this situation. — Reply to this email directly, view it on GitHub <#320 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAP6JGCKPSQEMXYEKZSPMA3ZI7P6JAVCNFSM6AAAAABJZGCRY2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBWGAYTINRRGY . You are receiving this because you commented.Message ID: @.>

Thank you, your thoughts is very good, I'll give it a try.

flaming-archer commented 3 days ago

@patduin Hi, we have tested many times and found that hook works on the client side of HMS, and the modification on the server side of HMS is invalid. There are two things that we can meet the needs after doing. 1. Each client of WD needs to configure different regular replacement expressions, and currently, WD is unable to load different configuration items when loading hooks. 2.The hook implementation in apiary is compiled from hive2, and we need to modify it to hive3.

It seems that ApiaryNullAuthorizationProvider doesn't need to be configured either

Perhaps we can assign these two PR to these two projects separately? Do you think our approach is correct @patduin.

patduin commented 2 days ago

on 1) it's possible to load multiple RegEx but yeah I see they are not scoped to by hook. The hook itself is set per metastore though so you could potentially load a different implementation that does what you need per metastore. I'd probably try make your own extensions where you just load a hook per metastore that does what you need. It's hard to make this very generic I'm not sure it's worth the effort. You'll have more control if you write your own hook and just use WD to hook it up which it already supports. You won't have to depend on us for reviews etc... You could potentially open source your extensions and we would be happy to add a link from WD readme as another example. Consider the extensions just as a potential example on how to do it, you don't necessarily have to use that project.

flaming-archer commented 1 day ago

on 1) it's possible to load multiple RegEx but yeah I see they are not scoped to by hook. The hook itself is set per metastore though so you could potentially load a different implementation that does what you need per metastore. I'd probably try make your own extensions where you just load a hook per metastore that does what you need. It's hard to make this very generic I'm not sure it's worth the effort. You'll have more control if you write your own hook and just use WD to hook it up which it already supports. You won't have to depend on us for reviews etc... You could potentially open source your extensions and we would be happy to add a link from WD readme as another example. Consider the extensions just as a potential example on how to do it, you don't necessarily have to use that project.

on 1) we can make a modification so that different configurations can be loaded to different hms in the future. @yangyuxia you can try sending a PR, it looks very similar to what you submitted last time. on 2) we can create our own hook and use it ourselves.