mu-find: return timestamps in json output instead of elisp time objects, add --format tests

yannleretaille commented 1 month ago

I was really confused by the format of the dates in the json output until I looked into the code and realized that they're in the crazy Elisp time object format. Doing [ (.":date"[0] * 65536 + .":date"[1]) in jq does work but is not very intuitive.

This PR adds support for checking during json generation if a symbol matches a known field with the type TimeT while being of the correct Sexp structure to be an Elisp time object and converts it into a Unix timestamp. I also added some tests for the different output formats.

Changes:

Field::field_is_timestamp(string): Checks if a name matches a known field with the TimeT type.
Sexp::etimep and Sexp::etime: Check if Sexp is a valid Elisp time object and convert it to timestamp
Sexp::Symbol::asFieldName: Strips leading colon from symbol names for field comparison.
Sexp::to_json_string: Now returns Unix timestamps using the above functions
Tests: Added XML, JSON, and SEXP format tests (/cmd/find/xml|json|sexp).

djcb commented 1 month ago

Thank you for your work, even including tests!

However, I think it does a bit too much, e.g. mu-sexp.hh shouldn't depend on mu-fields.hh

It could indeed be useful to add some :date-unix and :changed-unix to the json (and perhaps xml) output,, but I do that as a simple extension to the e.g. to_json_string, e.g. something like:

    // special case: modified   lib/utils/mu-sexp.cc
@@ -222,11 +222,18 @@ Sexp::to_json_string(Format fopts) const
            auto it{list().begin()};
            bool first{true};
            while (it != list().end()) {
-               sstrm << (first ? "" : ",")  << quote(it->symbol().name) << ":";
+               const auto name{ it->symbol().name};
+               sstrm << (first ? "" : ",")  << quote(name) << ":";
                ++it;
-               sstrm << it->to_json_string();
+               const auto val{it->to_json_string()};
+               sstrm << val;
                ++it;
                first = false;
+               // special case: tstamn-fields also get a "-unix" value
+               if (name == ":date" || name == ":changed") {
+                   sstrm << "," << quote(name + "-unix") << " :"
+                         << val;  // TODO: here, convert the actual unix value from the emacs-time
+               }
            }

yannleretaille commented 1 month ago

Hey, thank you for reviewing this!

I added the indirection over mu-fields.hh to automatically detect fields that are originally of TimeT. Let me know if you want it changed to refer to the names directly, and I'll update it! I would still suggest to keep Sexp::etimep and Sexp::etime though to ensure the elisp time objects are actually valid and parsed correctly.

The XML output already returns unix timestamps, so there is no need to add a dedicated <field>-unix here. The inconsistency between the json and xml outputs arises from the fact that the json formatter converts to sexp first and then the sexp to json (I assume this was done to having to add a json library as dependency). This double conversion is also the reason the fields in json are prefixed by colons, which is again not the case for the xml output. I doubt many people have been using the json date fields due to its weird and undocumented format, but it is technically a breaking change, so adding a new -unix field would also be an option. It would still be inconsistent though as this would mean the json output would the only formatter that returns -unix prefixed fields.

Please let me know which options you prefer: 1. mu-fields.hh indirection: A. keep B. remove and use hard-coded field names instead

2. naming of the json unix timestamp field: A. original field name (i.e. date, changed) B. prefixed field name (i.e. date-unix, changed-unix)

3. colon prefixes in json field names A. remove to be consistent with other outputs (e.g. from, size) B. keep prefixed (e.g. :from, :size)

djcb commented 1 month ago

Ah, so I think all that's needed is that little snippet I posted + a helper function for the TODO; no further changes are needed at this point; adding some extra fields is a non-risky change I think. Thank you!

djcb / mu

mu-find: return timestamps in json output instead of elisp time objects, add --format tests #2770