Investigate the HDFS data retention since it is an intermediate data storage.
Data governance tools like Apache Falcon might be overkill for our use case.
Could also be implemented in the HDFS-Restructure app since it is aware of the processed offsets.
Should the data retention be based on the kafka log retention policy or something else ?
Investigate the HDFS data retention since it is an intermediate data storage. Data governance tools like Apache Falcon might be overkill for our use case.
Could also be implemented in the HDFS-Restructure app since it is aware of the processed offsets. Should the data retention be based on the kafka log retention policy or something else ?