apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://gravitino.apache.org
Apache License 2.0
1.1k stars 345 forks source link

[Bug report] bug about hdfs root catalog #4586

Closed heziyi399 closed 3 months ago

heziyi399 commented 3 months ago

Version

main branch

Describe what's wrong

now I want to use hadoop catalog,i hava create metalake,catalog,schema,fileset,the localtion is:

企业微信截图_ad928ca7-f6bc-4dc4-b1e2-46f2f6f76b76 企业微信截图_e0c34e7d-52c0-44ab-9026-e2b48c2ea519 企业微信截图_e17e9d7b-d447-45d3-8a10-37f32c82377b

You can see that this location is the root directory。 I want to get file by using gravitino catalog,so I obtain files through the command line:

企业微信截图_0824e5de-0f11-4a7a-b9d9-22a6c94658bf 企业微信截图_d07927fb-587c-41e2-8912-47239ba0bea0

You can see that this result comes with a prefix and an error message“does not exist.”.But if I don't use the location of the root directory ,the result is normal:

企业微信截图_0667e67f-9582-454c-8e0e-91d289b2d941 企业微信截图_96c12d23-79e4-43ec-aefc-1546e935c2c6

Error message and/or stacktrace

企业微信截图_d07927fb-587c-41e2-8912-47239ba0bea0

How to reproduce

0.5.1

Additional context

No response

jerryshao commented 3 months ago

@xloya would you please take a look at this issue?

xloya commented 3 months ago

Have reproduced the issue, will fix this tomorrow. @heziyi399 Thanks for reporting this.

heziyi399 commented 3 months ago

@xloya I will now place the modified GravitinVirtualFileSystem code on the server and execute the command '/ gradlew :clients:filesystem-hadoop3-runtime:build -x test’, Afterwards, put the gravitino-firesystem-hadoop3-runtime-0.5.1.jar package into the/share/hadoop/common/lib directory and re request the HDFS command. The bug still exists. Is there anything wrong with this method?

xloya commented 3 months ago

@xloya I will now place the modified GravitinVirtualFileSystem code on the server and execute the command '/ gradlew :clients:filesystem-hadoop3-runtime:build -x test’, Afterwards, put the gravitino-firesystem-hadoop3-runtime-0.5.1.jar package into the/share/hadoop/common/lib directory and re request the HDFS command. The bug still exists. Is there anything wrong with this method?

Hi, you'd better confirm whether the relevant logic is really updated in the runtime jar you use. I have tested it in hadoop 2.7.3 and 3.1.0 according to the example in this issue, and I can get normal results in both. Hadoop 3.1.0:
image image

Hadoop 2.7.3: image

jerryshao commented 3 months ago

@heziyi399 would you please check again to see if @xloya 's PR really fix your problem? Thanks.

heziyi399 commented 3 months ago

@jerryshao yes,the problem has been resolved. May I ask another question?I want to know if the file system can be converted to Hadoop's Distributed FileSystem, because when I try the following code: conf.set("fs.AbstractFileSystem.gvfs.impl","com.datastrato.gravitino.filesystem.hadoop.Gvfs"); conf.set("fs.gvfs.impl","com.datastrato.gravitino.filesystem.hadoop.GravitinoVirtualFileSystem"); conf.set("fs.gravitino.server.uri","http://localhost:8090"); conf.set("fs.gravitino.client.metalake","metalake_demo"); Path filesetPath = new Path("gvfs://fileset/test_catalog/hzySchema/example_fileset/"); FileSystem fs = filesetPath.getFileSystem(conf); DistributedFileSystem dfs = (DistributedFileSystem) fs;

an error message will appear "com.datastrato.gravitino.filesystem.hadoop.GravitinoVirtualFileSystem cannot be cast to org.apache.hadoop.hdfs.DistributedFileSystem"

xloya commented 3 months ago

@jerryshao yes,the problem has been resolved. May I ask another question?I want to know if the file system can be converted to Hadoop's Distributed FileSystem, because when I try the following code: conf.set("fs.AbstractFileSystem.gvfs.impl","com.datastrato.gravitino.filesystem.hadoop.Gvfs"); conf.set("fs.gvfs.impl","com.datastrato.gravitino.filesystem.hadoop.GravitinoVirtualFileSystem"); conf.set("fs.gravitino.server.uri","http://localhost:8090"); conf.set("fs.gravitino.client.metalake","metalake_demo"); Path filesetPath = new Path("gvfs://fileset/test_catalog/hzySchema/example_fileset/"); FileSystem fs = filesetPath.getFileSystem(conf); DistributedFileSystem dfs = (DistributedFileSystem) fs;

an error message will appear "com.datastrato.gravitino.filesystem.hadoop.GravitinoVirtualFileSystem cannot be cast to org.apache.hadoop.hdfs.DistributedFileSystem"

@heziyi399 GravitinoVirtualFileSystem extends the super abstract class FileSystem, so you cannot cast it to the DistributedFileSystem which also one of the child class for the FileSystem.

jerryshao commented 3 months ago

Thanks @xloya for your fix, thanks @heziyi399 for your report, greatly appreciated. This PR will be merged in to main and branch-0.6.