Old-Man-Programmer / tree

Tree for Unix/LInux
GNU General Public License v2.0
147 stars 31 forks source link

Malformed JSON output when using `--du` with `-J` flag #10

Open Alchemyst0x opened 2 weeks ago

Alchemyst0x commented 2 weeks ago

First, I want to say — thank you for this awesome little tool! It's a staple utility that has been incredibly helpful. Now, on to the report:

The tree command produces malformed JSON output when using the --du flag in combination with the -J (JSON output) flag. This issue occurs when attempting to access certain directories that cannot be opened, resulting in repeated or misplaced "error" entries in the JSON structure. It can be reproduced on a macOS system when running the command from a terminal application that does not have Full Disk Access permissions.

Steps to Reproduce

To clarify that this is not a filesystem-related issue, but is instead due to System Integrity Protection (SIP) restrictions on this macOS device — the above command output was produced in the VSCode terminal, an application without Full Disk Access permissions. When running the same commands from a terminal application with Full Disk Access (e.g., Kitty):

   $ tree -J --du -L2 ~ | jq empty
   # No output (valid JSON)

   $ tree -J --du -L2 ~/Documents
   [
     {"type":"directory","name":"/Users/anon/Documents","size":8141957,"contents":[
       # ...
     ]}
   ,
     {"type":"report","size":9981695,"directories":20,"files":40}
   ]

Workaround

The issue can be avoided by:

Additional Context

The issue seems related to how --du calculates directory sizes and handles access errors. Adding the --prune flag avoids the problem by excluding directories that cannot be accessed. For further context, here are the permissions as reported by stat for the directory used in the example:

   $ stat ~/Documents
   16777232 12271426 drwx------ 34 anon staff 0 1088 "Oct  9 23:23:38 2024" "Oct  9 19:31:11 2024" "Oct  9 19:31:11 2024" "Dec 16 02:05:36 2022" 4096 0 0 /Users/anon/Documents
Alchemyst0x commented 2 days ago

Just wanted to update this issue with another discrepancy - it appears that the size calculation is very significantly off with the specific combination of flags I am using.

A bit of context: My initial interest in using the tree command (relative to this issue) was to create JSON outputs as indexes of files for routine backups, to be stored alongside ZIP archives of the encrypted data. Since encountering the malformed JSON bug referenced in this issue, I implemented my own Python-based file tree generator with comparable JSON output for this use case. During testing, I decided to compare its results with those from tree, particularly focusing on file sizes and the time taken to generate the output and serialize it to JSON.

Interestingly, my Python implementation consistently reported sizes that were significantly different from tree's output, which led me to validate those size values using DaisyDisk and du directly.

From my analysis, it turns out that the tree command, when run with --du in this context, was significantly overestimating the total size. I verified that my Python code produced accurate results and found that tree over-reported the size by almost double.

Specifically, tree reports a size of 15775465550 bytes (roughly 15.78 GB), while my code is accurately reporting 8719494480 bytes (around 8.12 GB). The Python results align with DaisyDisk's numbers, and du -sb is reporting nearly the same bytes value, 8719418239 bytes for this directory.

Below, you can see the respective outputs from all three tools:

# tree
$ tree -afhplDJ --du --sort mtime --dirsfirst --metafirst --prune $HOME/Downloads
# ...
{"type":"report","size":15775465550,"directories":2436,"files":19297}
# ...

# Python code
Time taken to generate and serialize file tree: 0.1827 seconds
Total: 2444 directories, 19297 files, and 0 links.
Total size: 8719494480 bytes (8.12 GB)
JSON data size: 1982222 bytes

# du
$ du -sb $HOME/Downloads
8719418239  /Users/anon/Downloads
$ du --version
du (GNU coreutils) 9.5

If I were worth a damn with C, I'd take a look myself, but C is entirely foreign to me. I'd be happy to contribute any additional information if requested, though.

Thanks again.