Norconex / jef

Job Execution Framework.
Apache License 2.0
4 stars 5 forks source link

job status file is not a properties file #8

Closed danizen closed 6 years ago

danizen commented 6 years ago

So, my decision to write an orchestration layer in python, using Spotify's luigi has the unfortunate side effect that I must teach Python how to read property files and jef status files.

I solved the problem already of locking the file for reading in a way compatible with RandomAccessFile, but now I've noticed that the status files are not actually Java properties files in format.

FileJobStatusStore calls writeUTF, and so rather than writing in latin1, the jef status file is not a valid java properties file, but something else.

Wikipedia suggests that a Java properties file will be in ISO-8859-1 aka latin-1, and UTF, whether standard or Java modified, doesn't seem to fit that.

Since it seems unlikely that non-Latin1 characters are needed for this simple job monitoring format, maybe a change can be made to read and write in ISO-8859-1. This is not an emergency - I will remove the BOM bytes from the stream before invoking my code to parse property files.

essiembre commented 6 years ago

You are right, those are not standard Java properties files. This will remain UTF-8 as we attempt whenever possible to have every file produced by JEF or other Norconex libraries to be UTF-8 by default. It uses this Properties implementation, which handles UTF-8 and supports multi-values properties, something the standard Java Properties does not do natively.

You could look at implementing your own IJobStatusStore to store progress differently, but I would probably just stick to your current solution.