Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.87k stars 2.94k forks source link

write parquet in spark get the NOT_FOUND exception sometimes #17592

Open coldwind6 opened 1 year ago

coldwind6 commented 1 year ago

Alluxio Version: 2.9.3

Describe the bug when use spark write parquet data to alluxio,got the error msg image

To Reproduce

it happened sometimes;

` val alluxioFilePath = ”alluxio://zk@192.168.0.217:2181;192.168.0.218:2181;192.168.0.219:2181/sparktables/“ + ”4e6a8c43cb5f4e9f8521d2e25e122b1f_C03296“; dataframe.write.mode(SaveMode.Overwrite).parquet(alluxioFilePath);

`

Expected behavior A clear and concise description of what you expected to happen.

Additional context Add any other context about the problem here.

jiacheliu3 commented 1 year ago

This just means the file cannot be seen from Alluxio namespace. If you check the Alluxio namespace by bin/alluxio fs ls <path> do you see this file? Does this file exist in the UFS? This error is from completeFile(), so is the file created in Alluxio?

coldwind6 commented 1 year ago

This just means the file cannot be seen from Alluxio namespace. If you check the Alluxio namespace by bin/alluxio fs ls <path> do you see this file? Does this file exist in the UFS? This error is from completeFile(), so is the file created in Alluxio?

The root directory is created by Alluxio. there is no other software to operate the directory except for alluxio client.And then using alluxio, I just use /sparktables/${uuid}/.... The path after uuid I didn't operate and I think i don't care the inner files .

jiacheliu3 commented 1 year ago

@coldwind6 All I can tell from the exception is that the file does not exist in the Alluxio namespace. Could you further check in the log if there's an exception when the client calls CreateFile to Alluxio? If the create failed, the complete will fail.

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.