RhetTbull / osxmetadata

Python package to read and write various MacOS extended attribute metadata such as tags/keywords and Finder comments from files. Includes CLI tool for reading/writing metadata.
MIT License
117 stars 2 forks source link

Setting `kMDItemWhereFroms` doesn't show in Finder #61

Closed nk9 closed 1 year ago

nk9 commented 2 years ago

This doesn't cause the URL to show in Finder:

import plistlib
from osxmetadata import *

url = "https://apple.com"
out_path = "/tmp/test_md.txt"

with open(out_path, "w") as f:
    f.write("hi")

meta = OSXMetaData(out_path)
meta.update_attribute(kMDItemWhereFroms, [url])

The attribute also is not listed with mdls:

$ mdls /tmp/test_md.txt
kMDItemFSContentChangeDate = 2022-09-20 14:56:00 +0000
kMDItemFSCreationDate      = 2022-09-20 14:56:00 +0000
kMDItemFSCreatorCode       = ""
kMDItemFSFinderFlags       = 0
kMDItemFSHasCustomIcon     = 0
kMDItemFSInvisible         = 0
kMDItemFSIsExtensionHidden = 0
kMDItemFSIsStationery      = 0
kMDItemFSLabel             = 0
kMDItemFSName              = "testmd.txt"
kMDItemFSNodeCount         = 2
kMDItemFSOwnerGroupID      = 0
kMDItemFSOwnerUserID       = 501
kMDItemFSSize              = 2
kMDItemFSTypeCode          = ""

But it IS there, and looks to be correctly formatted as a binary plist:

$ xattr -l /tmp/test_md.txt
com.apple.metadata:kMDItemWhereFroms: bplist00�_https://apple.com

I'm on Monterey 12.6 (21G115)

nk9 commented 2 years ago

I wonder if this is related to #10? Any idea how to force mdls/Finder to rebuild the cache? I tried relaunching Finder to no avail.

RhetTbull commented 2 years ago

I'll take a look and see if I can replicate this. I suspect you're right that this is related to #10.

I think the underlying issue is that Spotlight runs a metadata importer for each file type and this importer needs to be triggered to import any changes. For custom filetypes, the developer can write their own importer but absent this, there's no way to programmatically trigger the import that I know of. MDItem for example, only provides methods for reading metadata keys, not writing them. The only exceptions appear to be labels (colors) and tags which can be set via NSURL setresourcevalue:forKey:error:. For these, it would probably be best to rewrite osxphotos to use setresourcevalue rather than direclty modifying the xattr. (I wasn't aware of this limitation on xattr and Spotlight importers when I first wrote osxmetadata).

Just brainstorming....one idea might be to change the modification date of the file then change it back to the original date. Will need to do some testing to see if this triggers the importer (and if so, how long does it take the importer to trigger).

nk9 commented 2 years ago

I noticed that if I change a URL on a file which is already correctly showing in the Finder Get Info panel, then the new URL will show up in Finder immediately. So it may not be a caching issue after all? Is it possible that something needs to change in .DS_Store?

nk9 commented 2 years ago

Interestingly, WebKit sets both kMDItemWhereFroms and kMDItemDownloadedDate. However, when I download a file from Safari, I don't actually see the DownloadedDate xattr. And Firefox succeeds in setting the URLs but doesn't even attempt to set the downloaded date. So maybe this is a red herring?

One other thing that both Safari and FF do have, though, is com.apple.quarantine. Setting that to a string from another downloaded file doesn't seem to do anything either…

RhetTbull commented 2 years ago

Interesting, WebKit uses something call MDItemSetAttribute and I can't figure out where it's coming from (appears to be an undocumented API in CoreServices). A little googling on that led me to here with this admonition:

MDItemSetAttribute will set attributes in the spotlight database: BUT don't use it (or the setAttribute:forKey as it is almost certainly the same thing). There are two problems with MDItemSetAttribute - one is that the spotlight datbase can be (should be if you like speedy searches) wiped out with a spotlight rebuild. Then your tags will not go back in. Number two is a little more sneaky. It seems that when you call MDItemSetAttribute on a file for an attribute, then any subsequent mdimport on that file will NOT update the spotlight DB for that key. Your MDItemSetAttribute call has somehow marked that field as 'not to be changed' by importers. The comment above where Finder comments were not picked up by spotlight could have easily happened when using MDItemSetAttribute on that file. In short MDItemSetAttribute will hose the spotlight DB on the computer - unless you use it (I would guess) to add your own custom fields.

So...will need to play around with this. Maybe using the undocumented API + setting the extended attribute will work. It would sure be nice if Apple provided a better way to do this!

RhetTbull commented 2 years ago

@nk9 I've figured out how to call the undocumented MDItemSetAttribute from python. The following snippet (also as a gist) will set kMDItemWhereFroms if called like this:

python setmd.py file.txt kMDItemWhereFroms array google.com

"""Set metadata on macOS files using undocumented function MDItemSetAttribute

Background: Apple provides MDItemCopyAttribute to get metadata from files:
https://developer.apple.com/documentation/coreservices/1427080-mditemcopyattribute?language=objc

but does not provide a documented way to set file metadata.

This script shows how to use the undocumented function MDItemSetAttribute to do so.

`pip install pyobjc` to install the required Python<-->Objective C bridge package.
"""

import sys
from typing import List, Union

import CoreFoundation
import CoreServices
import objc

# load undocumented function MDItemSetAttribute
# signature: Boolean MDItemSetAttribute(MDItemRef, CFStringRef name, CFTypeRef attr);
# references:
# https://github.com/WebKit/WebKit/blob/5b8ad34f804c64c944ebe43c02aba88482c2afa8/Source/WTF/wtf/mac/FileSystemMac.MDItemSetAttribute
# https://pyobjc.readthedocs.io/en/latest/metadata/manual.html#objc.loadBundleFunctions
# signature of B@@@ translates to returns BOOL, takes 3 arguments, all objects
# In reality, the function takes references (pointers) to the objects, but pyobjc barfs if
# the function signature is specified using pointers.
# Specifying generic objects allows the bridge to convert the Python objects to the
# appropriate Objective C object pointers.

def MDItemSetAttribute(mditem, name, attr):
    """dummy function definition"""
    ...

# This will load MDItemSetAttribute from the CoreServices framework into module globals
objc.loadBundleFunctions(
    CoreServices.__bundle__,
    globals(),
    [("MDItemSetAttribute", b"B@@@")],
)

def set_file_metadata(file: str, attribute: str, value: Union[str, List]) -> bool:
    """Set file metadata using undocumented function MDItemSetAttribute

    file: path to file
    attribute: metadata attribute to set
    value: value to set attribute to; must match the type expected by the attribute (e.g. str or list)

    Note: date attributes (e.g. kMDItemContentCreationDate) not yet handled.

    Returns True if successful, False otherwise.
    """
    mditem = CoreServices.MDItemCreate(None, file)
    if isinstance(value, list):
        value = CoreFoundation.CFArrayCreate(
            None, value, len(value), CoreFoundation.kCFTypeArrayCallBacks
        )
    return MDItemSetAttribute(
        mditem,
        attribute,
        value,
    )

def main():
    """Set metadata on macOS files using undocumented function MDItemSetAttribute

    Usage: setmd.py <file> <attribute> <type> <value> <value> ...

    <file>: path to file
    <attribute>: metadata attribute to set, e.g. kMDItemWhereFroms
    <type>: type of value to set, e.g. string or array; must match the type expected by the attribute (e.g. str or list)
    <value>: value(s) to set attribute to

    For example: setmd.py /tmp/test.txt kMDItemWhereFroms array http://example.com

    For metadata attributes and types, see https://developer.apple.com/documentation/coreservices/file_metadata/mditem/common_metadata_attribute_keys?language=objc
    """
    # super simple argument parsing just for demo purposes
    if len(sys.argv) < 5:
        print(main.__doc__)
        sys.exit(1)

    file = sys.argv[1]
    attribute = sys.argv[2]
    type_ = sys.argv[3]
    values = sys.argv[4:]

    if type_ == "string":
        values = values[0]

    try:
        attribute = getattr(CoreServices, attribute)
    except AttributeError:
        print(f"Invalid attribute: {attribute}")
        sys.exit(1)

    if not set_file_metadata(file, attribute, values):
        print(f"Failed to set metadata attribute {attribute} on {file}")
        sys.exit(1)
    else:
        print(f"Successfully set metadata attribute {attribute} on {file} to {values}")

if __name__ == "__main__":
    main()

It doesn't yet handle types other than string or array (need to reference here for full list of attributes/types) -- kMDItemWhereFroms is an array. Finder comments and Finder tags cannot be set this way. Finder comments must be set by AppleScript and Finder tags by xattr using com.apple.metadata:_kMDItemUserTags.

I verified that both mdls and Finder show the updated kMDItemWhereFroms when set this way.

More to come -- will look at adapting this for osxmetadata.

nk9 commented 2 years ago

This is incredible, thank you! I was thinking that I should try using MDItemSetAttribute as used in the browser code above, but hadn't gotten around to it since I thought it would have to be in ObjC or Swift. Kudos for working it out, and so quickly!

RhetTbull commented 2 years ago

Glad it's useful! Check out the gist where I've updated the code to handle all the different types that MDItems can have. I plan to rewrite osxmetadata to use MDItemSetAttribute and MDItemCopyAttribute wherever possible but have some other projects on the front burner at the moment.

RhetTbull commented 1 year ago

@all-contributors add @nk9 for bug

allcontributors[bot] commented 1 year ago

@RhetTbull

I've put up a pull request to add @nk9! :tada:

RhetTbull commented 1 year ago

@nk9 I've release version 1.0.0 of osxmetadata that fixes this bug and several others. It's a complete rewrite to use the native macOS calls to get/set metadata. It does change the API in breaking ways though so check out the README.md.

nk9 commented 1 year ago

So Rhet, I am kind of in awe how much work you've done over the past two weeks on this. I'm just glad I could be the inspiration for the flurry of activity on this project! And thanks for updating the docs too.

However, I have some bad news… I'm still seeing the same behavior. 😬

🕙 15:29:07 ❯ jq '.default.osxmetadata' Pipfile.lock
{
  "hashes": [
    "sha256:4883539ae64d557f1a25b1b7ac7b6e30e735b9853bb3233913d206d323ac4cf9",
    "sha256:9adde4c63e727260d26a4917b0ff5388336f295372e3218e2759d526c3aedbbb"
  ],
  "index": "pypi",
  "version": "==1.0.0"
}
from osxmetadata import *

url = "https://apple.com"
out_path = "/tmp/test_md.txt"

with open(out_path, "w") as f:
    f.write("hi")

meta = OSXMetaData(out_path)
meta.kMDItemWhereFroms = [url]
🕙 15:25:35 ❯ mdls /tmp/test_md.txt
kMDItemFSContentChangeDate = 2022-10-08 14:23:00 +0000
kMDItemFSCreationDate      = 2022-10-08 14:23:00 +0000
kMDItemFSCreatorCode       = ""
kMDItemFSFinderFlags       = 0
kMDItemFSHasCustomIcon     = 0
kMDItemFSInvisible         = 0
kMDItemFSIsExtensionHidden = 0
kMDItemFSIsStationery      = 0
kMDItemFSLabel             = 0
kMDItemFSName              = "test_md.txt"
kMDItemFSNodeCount         = 2
kMDItemFSOwnerGroupID      = 0
kMDItemFSOwnerUserID       = 501
kMDItemFSSize              = 2
kMDItemFSTypeCode          = ""
🕙 15:23:00 ❯ xattr -l /tmp/test_md.txt
com.apple.metadata:kMDItemWhereFroms:
0000   62 70 6C 69 73 74 30 30 A1 01 5F 10 11 68 74 74    bplist00.._..htt
0010   70 73 3A 2F 2F 61 70 70 6C 65 2E 63 6F 6D 08 0A    ps://apple.com..
0020   00 00 00 00 00 00 01 01 00 00 00 00 00 00 00 02    ................
0030   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 1E    ................
image

Am I doing something wrong?

RhetTbull commented 1 year ago

Strange -- looks like you're doing everything right. This does work in my testing.

What version of macOS are you using? I'm on Catalina still so perhaps it's an issue with newer versions of macOS?

> touch test_url.txt
> python
Python 3.10.5 (main, Jul 17 2022, 07:22:36) [Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from osxmetadata import *
md>>> md = OSXMetaData("test_url.txt")
>>> md.kMDItemWhereFroms = ["apple.com"]
>>> md.kMDItemWhereFroms
['apple.com']
>  mdls test_url.txt
_kMDItemDisplayNameWithExtensions      = "test_url.txt"
kMDItemContentCreationDate             = 2022-10-08 14:54:38 +0000
kMDItemContentModificationDate         = 2022-10-08 14:54:38 +0000
kMDItemContentType                     = "public.plain-text"
kMDItemContentTypeTree                 = (
    "public.plain-text",
    "public.text",
    "public.data",
    "public.item",
    "public.content"
)
...
kMDItemWhereFroms                      = (
    "apple.com"
)
> xattr -l test_url.txt
com.apple.metadata:kMDItemWhereFroms:
00000000  62 70 6C 69 73 74 30 30 A1 01 59 61 70 70 6C 65  |bplist00..Yapple|
00000010  2E 63 6F 6D 08 0A 00 00 00 00 00 00 01 01 00 00  |.com............|
00000020  00 00 00 00 00 02 00 00 00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 14                                |......|
00000036
Screen Shot 2022-10-08 at 7 58 28 AM
RhetTbull commented 1 year ago

@nk9 would you mind cloning the repo then running the test suite? See instructions in README_DEV.md for how to install/build the package.

nk9 commented 1 year ago

I'm on Monterey 12.6. Hopefully it's not an OS version issue… but I'll give the tests a run and report back.

nk9 commented 1 year ago

The README_DEV.md instructions could benefit from a notice that you have to use poetry shell before doit test will work. 😄

But I got the tests running, and fortunately, they nearly all work! But there are three failures:

$ doit test
.  test
TaskFailed - taskid:test
Command failed: 'poetry run pytest --doctest-glob=README.md tests/' returned 1

########################################
test <stdout>:
============================= test session starts ==============================
platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0
rootdir: /Users/nick/Projects/osxmetadata
collected 557 items

tests/test_cli.py .......F.......F.F                                     [  3%]
tests/test_datetime_handling.py ..                                       [  3%]
tests/test_datetime_utils.py ..........                                  [  5%]
tests/test_finder_info.py ....                                           [  6%]
tests/test_finder_tags.py ..                                             [  6%]
tests/test_findercomment.py ...                                          [  7%]
tests/test_mditem_attributes.py ........................................ [ 14%]
........................................................................ [ 27%]
........................................................................ [ 40%]
........................................................................ [ 52%]
........................................................................ [ 65%]
...............................................................          [ 77%]
tests/test_nsurl_attributes.py ......................................... [ 84%]
........................................................................ [ 97%]
...                                                                      [ 98%]
tests/test_osxmetada_asdict.py ...                                       [ 98%]
tests/test_osxmetada_path.py .                                           [ 98%]
tests/test_osxmetadata_exceptions.py ......                              [ 99%]
tests/test_xattr.py .                                                    [100%]

=================================== FAILURES ===================================
_______________________________ test_cli_remove ________________________________

test_file = <tempfile._TemporaryFileWrapper object at 0x10a6cda50>

    def test_cli_remove(test_file):
        """Test --remove"""

        md = OSXMetaData(test_file.name)
        md.authors = ["John Doe", "Jane Doe"]
        md.tags = [Tag("test", 0)]

        runner = CliRunner()
        result = runner.invoke(
            cli,
            [
                "--remove",
                "authors",
                "John Doe",
                "--remove",
                "tags",
                "test,0",
                test_file.name,
            ],
        )
        snooze()
        assert result.exit_code == 0

        md = OSXMetaData(test_file.name)
>       assert md.authors == ["Jane Doe"]
E       AssertionError: assert ['John Doe', 'Jane Doe'] == ['Jane Doe']
E         At index 0 diff: 'John Doe' != 'Jane Doe'
E         Left contains one more item: 'Jane Doe'
E         Use -v to get more diff

tests/test_cli.py:195: AssertionError
___________________________ test_cli_backup_restore ____________________________

test_dir = '/Users/nick/Projects/osxmetadata/tmp_ma9znhab'

    def test_cli_backup_restore(test_dir):
        """Test --backup and --restore"""

        dirname = pathlib.Path(test_dir)
        test_file = dirname / "test_file.txt"
        test_file.touch()

        md = OSXMetaData(test_file)
        md.tags = [Tag("test", 0)]
        md.authors = ["John Doe", "Jane Doe"]
        md.wherefroms = ["http://www.apple.com"]
        md.downloadeddate = [datetime.datetime(2019, 1, 1, 0, 0, 0)]
        md.stationerypad = True

        runner = CliRunner()
        result = runner.invoke(cli, ["--backup", test_file.as_posix()])
        assert result.exit_code == 0

        # test the backup file was written and is readable
        backup_file = dirname / BACKUP_FILENAME
        assert backup_file.is_file()
        backup_data = load_backup_file(backup_file)
        assert backup_data[test_file.name]["stationerypad"] == True

        # wipe the data
        result = runner.invoke(cli, ["--wipe", test_file.as_posix()])
        snooze()
        md = OSXMetaData(test_file)
        assert not md.tags
>       assert not md.authors
E       AssertionError: assert not ['John Doe', 'Jane Doe']
E        +  where ['John Doe', 'Jane Doe'] = <osxmetadata.osxmetadata.OSXMetaData object at 0x10a753850>.authors

tests/test_cli.py:411: AssertionError
________________________________ test_cli_order ________________________________

test_dir = '/Users/nick/Projects/osxmetadata/tmp_a3hhb18m'

    def test_cli_order(test_dir):
        """Test order CLI options are executed

        Order of execution should be:
        restore, wipe, copyfrom, clear, set, append, remove, mirror, get, list, backup
        """

        dirname = pathlib.Path(test_dir)
        test_file = dirname / "test_file.txt"
        test_file.touch()
        test_file.write_text("test")

        md = OSXMetaData(test_file)
        md.tags = [Tag("test", 0)]
        md.authors = ["John Doe", "Jane Doe"]
        md.wherefroms = ["http://www.apple.com"]
        md.downloadeddate = [datetime.datetime(2019, 1, 1, 0, 0, 0)]
        md.findercomment = "Hello World"

        runner = CliRunner()

        # first, create backup file for --restore
        runner.invoke(cli, ["--backup", test_file.as_posix()])

        # wipe the data
        runner.invoke(cli, ["--wipe", test_file.as_posix()])
        snooze()

        # restore the data and check order of operations
        result = runner.invoke(
            cli,
            [
                "--get",
                "comment",
                "--set",
                "authors",
                "John Smith",
                "--restore",
                "--set",
                "title",
                "Test Title",
                "--clear",
                "title",
                "--append",
                "tags",
                "test2",
                "--set",
                "comment",
                "foo",
                "--remove",
                "authors",
                "Jane Doe",
                "--append",
                "authors",
                "Jane Smith",
                "--mirror",
                "comment",
                "findercomment",
                test_file.as_posix(),
            ],
        )
        output = parse_cli_output(result.output)
        assert output["comment"] == "Hello World"

        snooze()
        md = OSXMetaData(test_file)
>       assert md.authors == ["John Smith", "Jane Smith"]
E       AssertionError: assert ['John Doe', 'Jane Doe'] == ['John Smith', 'Jane Smith']
E         At index 0 diff: 'John Doe' != 'John Smith'
E         Use -v to get more diff

tests/test_cli.py:517: AssertionError
=========================== short test summary info ============================
FAILED tests/test_cli.py::test_cli_remove - AssertionError: assert ['John Doe...
FAILED tests/test_cli.py::test_cli_backup_restore - AssertionError: assert no...
FAILED tests/test_cli.py::test_cli_order - AssertionError: assert ['John Doe'...
======================== 3 failed, 554 passed in 14.14s ========================

Let me know if I can do anything else to help you narrow this down!

RhetTbull commented 1 year ago

The README_DEV.md instructions could benefit from a notice that you have to use poetry shell before doit test will work.

Good point! I'll do so. I use the zsh-poetry plugin which activates/deactivates poetry shells automatically so I always forget that poetry shell is a thing.

Glad to see most of the tests are running. Interesting that all three failures appear to be with kMDItemAuthors. However, this doesn't shed light on the issue you are encountering with kMDItemWhereFroms. The test suite specifically tests writing and reading back all writable attributes so it appears your code should successful set kMDItemWhereFroms (as the fact that the xattr was set successfully indicates). I'll need to think some more about why the change isn't showing in Finder and mdls.

I could add a test that also checks the output of mdls after writing the attribute (and I've got an mdls parser written for another project) but I've noticed that it can take some time before the data is re-indexed and appears in mdls so this would be hard to incorporate in a test suite.

RhetTbull commented 1 year ago

I just noticed something....you were writing your test file to /tmp. I've noticed that metadata doesn't "stick" in /tmp' or/private/var/tmpas Spotlight doesn't index them. For that reason, the test methods in/testsuse a custom fixture to create (and cleanup) all necessary temp files in the current directory where the tests are run, not in/private/var/tmp` as would be done with the usual temp file methods.

Try with a file that's not in a temporary directory and let me know if you get different results.

nk9 commented 1 year ago

Whoa, what a bizarre quirk! Indeed, when I write something to my home directory instead, the Where froms are set as expected and are shown immediately in Finder. Still doesn't explain the test errors…

But maybe this can be closed after all? Probably a good idea to document this /tmp quirk as well. Thanks so much!

RhetTbull commented 1 year ago

I'll add a note to the docs about temporary files. I got the test suite running last night in GitHub actions (via a BigSur VM, the latest available in GitHub). Interestingly the same three tests fail with the same result. Something about kMDItemAuthors isn't right for macOS > Catalina. I'll open a separate issue for this.

tests/test_cli.py:517: AssertionError
=========================== short test summary info ============================
FAILED tests/test_cli.py::test_cli_remove - AssertionError: assert ['John Doe...
FAILED tests/test_cli.py::test_cli_backup_restore - AssertionError: assert no...
FAILED tests/test_cli.py::test_cli_order - AssertionError: assert ['John Doe'...
=================== 3 failed, 551 passed, 3 skipped in 8.14s ===================
RhetTbull commented 1 year ago

I've opened a new issue (#68) for the kMDItemAuthors fails and added a section to the README.md regarding temporary files. If you have any other recommendations to make the README more useful, feel free to open an issue or send a PR.