TresAmigosSD / SMV

Spark Modularized View
Apache License 2.0
42 stars 22 forks source link

Use semaphore file in SmvParquetOnHdfsIoStrategy to prevent half-written parquet data #1479

Closed ninjapapa closed 5 years ago

ninjapapa commented 5 years ago

Current SmvParquetOnHdfsIoStrategy's implementation on isPersisted is just to check whether the *.parquet file exist. It is risky to have some leftover half-written parquet data files ruin the whole result. Need to introduce a semaphore file which is created after the parquet file finished written.