Open MattBlissett opened 10 months ago
We rely on INSERT OVERWRITE
, which creates a read lock according to the documentation. I made a simple test, and everything appears to be correct and working as we expect.
The lock is present when data is being inserted:
SHOW LOCKS uat.test_lock
tab_name | mode |
---|---|
uat@test_lock | EXCLUSIVE |
SQL queries for that table are waiting for lock releases during insertion.
Thanks @muttcg
Do we know if the error is always related to the occurrence_multimedia
table?
If so, perhaps there is something we're overlooking in locking behavior when using JOIN
queries and replacing both tables - might be something to test too.
@timrobertson100 Since YARN stores only the last 5 days of logs (if I'm not mistaken), I didn't find more similar cases. But, I also tried to simulate multimedia table insert from another table, and it worked correctly, so no clear answer why it failed.
Thanks. Found in Slack from 2 months ago, so this confirms not just multimedia:
The download that failed this morning at 8:06 has error
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/hive/warehouse/prod_h.db/occurrence/003993_0
I have just seen this when trying a clustering run. The clustering run is slightly different than a download in that it is doing a Spark SQL job, sourced from the Hive metastore. It could be that Spark SQL doesn't lock (or perhaps our environment is not configured to lock) the same way as the Oozie-launched MR jobs.
23/11/16 06:15:51 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-a7a68671-faf6-422e-b105-b98435477dbe
...
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/hive/warehouse/prod_h.db/occurrence/000852_0
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
The nightly table build job appears to still be in the create-avro stage
Edited after the build table completed to add:
The create-avro launches 2 child jobs, and it may be noteworthy that the first (INSERT OVERWRITE TABLE occ...occurrence_avro(Stage-1)
) finished 21 secs before the error above, and the second (INSERT OVERWRITE TABLE occurrenc...mm_record(Stage-1)
) did not start until a minute later.
Also likely relevant is that the file that is missing in my query was actually created nearly an hour before the error and before I submitted my clustering job, but was presumably sitting in a job tmp directory and moved into place as the MR job completed (hdfs mv
should hold the create time and not the time it was moved)
hdfs dfs -ls /user/hive/warehouse/prod_h.db/occurrence | grep "000852_0"
-rwxrwxrwt 3 hdfs hive 36353064 2023-11-16 05:23 /user/hive/warehouse/prod_h.db/occurrence/000852_0
Some downloads can fail around 06:00Z when the HDFS table build completes.
There are a few of these from recent weeks, but it's not necessarily a new problem.