RhetTbull / osxmetadata

Python package to read and write various MacOS extended attribute metadata such as tags/keywords and Finder comments from files. Includes CLI tool for reading/writing metadata.
MIT License
111 stars 2 forks source link

Backup File format: JSON compliant + Only includes set attributes + One line per record #75

Closed porg closed 1 year ago

porg commented 1 year ago

.osxmeta JSON v0.99.37

.osxmeta JSON v1.0.0 — This update brought 1 improvement but 2 disadvantages

Which together cause these downsides:

Proposed next JSON format

RhetTbull commented 1 year ago

JSON is a format for interoperability with computers, not humans. There are many tools such as jq (brew install jq) and json_pp for working for json files. I'm using the json library from the standard python library to output the json using standard JSON indentation rules. The backup file is currently output with an indentation of 2 spaces to make it at least human inspectable (but the goal is machine readability, not human). There are tools such as jd (brew install jd) for doing structural diff of JSON already. I do not plan to change the format. Similar to the discussion on #73, I believe that command line tools should make use of other specialized command line tools rather than implement all functionality in a single tool

The suggestion to backup only attributes that are non-null is worth considering. Some file formats such as TOML do this by default. I'll take a look at adding a filter for null values before outputting the JSON but I need to first ensure there are no 2nd order effects on the restore process from doing so.

RhetTbull commented 1 year ago

Version v1.1.0 strips null values from the backup JSON file. This fixes the issue of lots of null fields and improves readability somewhat. As stated above, I don't intend to fix the other formatting issues as I think there are other ways to address this (using 3rd party tools) and I'm content to use the python standard JSON library.

porg commented 1 year ago
RhetTbull commented 1 year ago

On the issue of formatting as human readible as possible if it comes at almost no extra development/maintenance/performance cost, I don't agree, but I respect.

From your original issue, I think what you want is every record on a single line but this is not human readable. The only alternative is to use JSON indentation, which osxmetadata now does, but that also means that arrays are broken across lines -- this is standard formatting for JSON. So I don't really understand the alternative you want (other than writing a custom JSON parser that doesn't follow JSON conventions).

for every average user with no reliance on extra software

Average user shouldn't be inspecting the backup file. It's an internal thing used by osxmetadata and the format is subject to change. I considered using sqlite as the format which would have been even more difficult for average user to inspect. But my point is that the backup file is designed for the use of osxmetadata not the user. The fact that it's JSON and in #57 I made it well-formed so it could be inspected easily, is a bonus, not a feature.

Ascertains a user "Did the backup work? Were all files included by my shell pattern?"

You can use osxmetadata with the --verbose flag which prints out the name of each file being processed.

You could do one of the following:

grep _filepath .osxmetadata.json

of slightly more pretty:

jq '.[] | ._filepath' .osxmetadata.json

Both of these print the names (with full path) of the files in the backup file.

The .[] is needed in jq to tell jq to operate on the values of the array that are in the backup file (the backup file is actually one JSON object (array) with a bunch of values representing each file). I changed this in #57 to make the JSON well formed (though the previous "malformed" format was much easier to work with in osxmetadata).

If you wanted to inspect the filename and a specific key, for example, kMDItemFinderComment which is the Finder comment, you could do this:

jq '.[] | ._filepath, .kMDItemFinderComment' .osxmetadata.json | paste -d, - -

This command extracts the _filepath and kMDItemFinderComment keys and then prints them as comma separated values (CSV) that could then easily be imported into some other tool. The - - in paste tells it to take two values at a time then join them with , (specified by -d = delimiter).

most serializing frameworks have formatting/beautifying options,

The python JSON library allows you to set the indentation, or no indentation. I use indentation of 2 for osxmetadata. The alternative would be using no indentation and that results in one file record per line (this is what earlier versions of osxmetadata did) but that is completely unreadable without using a 3rd party tool. For my purposes, I use the bat command (a replacement for cat) to view JSON files and many other things because it uses colorized output and automatic paging. This makes it easy to quickly inspect the backup file which would not be possible if no indentation was used.

porg commented 1 year ago

Ascertain user that backup goes / went well

You convinces me that these are indeed sufficient:

Readability

Excerpt .osxmetadata v0.99.37

{"_version": "0.99.37", "_filepath": "/Volumes/Shared/Videos/Creative-Ads/Apple iPhone 4 (2010).mp4", "_filename": "Apple iPhone 4 (2010)", "com.apple.FinderInfo": {"color": 2, "stationarypad": false}, "com.apple.metadata:_kMDItemUserTags": [["Technology", 0], ["Entertainment", 0], ["\u2022Done", 2]], "com.apple.metadata:kMDItemFinderComment": ""}
{"_version": "0.99.37", "_filepath": "/Volumes/Shared/Videos/Creative-Ads/Benneton (1998).avi", "_filename": "Benneton (1998).avi", "com.apple.FinderInfo": {"color": 4, "stationarypad": false}, "com.apple.metadata:_kMDItemUserTags": [["Clothing", 0], ["Anti-Racism", 0], ["Peace", 0], ["Cooperation", 0], ["\u2022Done but unimportant", 4]], "com.apple.metadata:kMDItemFinderComment": ""}
{"_version": "0.99.37", "_filepath": "/Volumes/Shared/Videos/Creative-Ads/Hugo Boss (1995).mp4", "_filename": "Hugo Boss (1995).mp4", "com.apple.FinderInfo": {"color": 2, "stationarypad": false}, "com.apple.metadata:_kMDItemUserTags": [["Clothing", 0], ["Youth", 0], ["\u2022Done", 2]], "com.apple.metadata:kMDItemFinderComment": ""}

As I see it TSV would be quite efficient in terms of storage

RhetTbull commented 1 year ago

I personally don't find the excerpt from v0.99.37 very readable because I don't like horizontal scrolling. However, the point is moot. That format is not valid JSON (each line is valid JSON, but the file is not...and there was an issue (#57) to request the file be valid JSON hence the current format. Valid JSON does not support one record per line in the form that osxmetadata needs the data. It would be one line with all the records or the current format.

TSV might be more human readable but the backup format is meant to read by osxmetadata and TSV would require more work. As it is, I can read the backup file and get it into the format osxmetadata needs in 3 lines of code:

with open(backup_file, mode="r") as fp:
    backup_records = json.load(fp)
backup_data = {data["_filename"]: data for data in backup_records}

I think you're trying to use the backup file for something it's not intended for. A better solution would be to add an option to osxmetadata -list to output the data in json, tsv, or csv and then you could create whatever report you wanted in whatever format. I'll open a separate issue for this feature.

porg commented 1 year ago

osxmetadata -list output to json, tsv, or csv. Yes! Fine!