Closed nk9 closed 1 year ago
I wonder if this is related to #10? Any idea how to force mdls/Finder to rebuild the cache? I tried relaunching Finder to no avail.
I'll take a look and see if I can replicate this. I suspect you're right that this is related to #10.
I think the underlying issue is that Spotlight runs a metadata importer for each file type and this importer needs to be triggered to import any changes. For custom filetypes, the developer can write their own importer but absent this, there's no way to programmatically trigger the import that I know of. MDItem for example, only provides methods for reading metadata keys, not writing them. The only exceptions appear to be labels (colors) and tags which can be set via NSURL setresourcevalue:forKey:error:. For these, it would probably be best to rewrite osxphotos to use setresourcevalue rather than direclty modifying the xattr. (I wasn't aware of this limitation on xattr and Spotlight importers when I first wrote osxmetadata).
Just brainstorming....one idea might be to change the modification date of the file then change it back to the original date. Will need to do some testing to see if this triggers the importer (and if so, how long does it take the importer to trigger).
I noticed that if I change a URL on a file which is already correctly showing in the Finder Get Info panel, then the new URL will show up in Finder immediately. So it may not be a caching issue after all? Is it possible that something needs to change in .DS_Store
?
Interestingly, WebKit sets both kMDItemWhereFroms
and kMDItemDownloadedDate
. However, when I download a file from Safari, I don't actually see the DownloadedDate xattr. And Firefox succeeds in setting the URLs but doesn't even attempt to set the downloaded date. So maybe this is a red herring?
One other thing that both Safari and FF do have, though, is com.apple.quarantine
. Setting that to a string from another downloaded file doesn't seem to do anything either…
Interesting, WebKit uses something call MDItemSetAttribute
and I can't figure out where it's coming from (appears to be an undocumented API in CoreServices). A little googling on that led me to here with this admonition:
MDItemSetAttribute will set attributes in the spotlight database: BUT don't use it (or the setAttribute:forKey as it is almost certainly the same thing). There are two problems with MDItemSetAttribute - one is that the spotlight datbase can be (should be if you like speedy searches) wiped out with a spotlight rebuild. Then your tags will not go back in. Number two is a little more sneaky. It seems that when you call MDItemSetAttribute on a file for an attribute, then any subsequent mdimport on that file will NOT update the spotlight DB for that key. Your MDItemSetAttribute call has somehow marked that field as 'not to be changed' by importers. The comment above where Finder comments were not picked up by spotlight could have easily happened when using MDItemSetAttribute on that file. In short MDItemSetAttribute will hose the spotlight DB on the computer - unless you use it (I would guess) to add your own custom fields.
So...will need to play around with this. Maybe using the undocumented API + setting the extended attribute will work. It would sure be nice if Apple provided a better way to do this!
@nk9 I've figured out how to call the undocumented MDItemSetAttribute from python. The following snippet (also as a gist) will set kMDItemWhereFroms if called like this:
python setmd.py file.txt kMDItemWhereFroms array google.com
"""Set metadata on macOS files using undocumented function MDItemSetAttribute
Background: Apple provides MDItemCopyAttribute to get metadata from files:
https://developer.apple.com/documentation/coreservices/1427080-mditemcopyattribute?language=objc
but does not provide a documented way to set file metadata.
This script shows how to use the undocumented function MDItemSetAttribute to do so.
`pip install pyobjc` to install the required Python<-->Objective C bridge package.
"""
import sys
from typing import List, Union
import CoreFoundation
import CoreServices
import objc
# load undocumented function MDItemSetAttribute
# signature: Boolean MDItemSetAttribute(MDItemRef, CFStringRef name, CFTypeRef attr);
# references:
# https://github.com/WebKit/WebKit/blob/5b8ad34f804c64c944ebe43c02aba88482c2afa8/Source/WTF/wtf/mac/FileSystemMac.MDItemSetAttribute
# https://pyobjc.readthedocs.io/en/latest/metadata/manual.html#objc.loadBundleFunctions
# signature of B@@@ translates to returns BOOL, takes 3 arguments, all objects
# In reality, the function takes references (pointers) to the objects, but pyobjc barfs if
# the function signature is specified using pointers.
# Specifying generic objects allows the bridge to convert the Python objects to the
# appropriate Objective C object pointers.
def MDItemSetAttribute(mditem, name, attr):
"""dummy function definition"""
...
# This will load MDItemSetAttribute from the CoreServices framework into module globals
objc.loadBundleFunctions(
CoreServices.__bundle__,
globals(),
[("MDItemSetAttribute", b"B@@@")],
)
def set_file_metadata(file: str, attribute: str, value: Union[str, List]) -> bool:
"""Set file metadata using undocumented function MDItemSetAttribute
file: path to file
attribute: metadata attribute to set
value: value to set attribute to; must match the type expected by the attribute (e.g. str or list)
Note: date attributes (e.g. kMDItemContentCreationDate) not yet handled.
Returns True if successful, False otherwise.
"""
mditem = CoreServices.MDItemCreate(None, file)
if isinstance(value, list):
value = CoreFoundation.CFArrayCreate(
None, value, len(value), CoreFoundation.kCFTypeArrayCallBacks
)
return MDItemSetAttribute(
mditem,
attribute,
value,
)
def main():
"""Set metadata on macOS files using undocumented function MDItemSetAttribute
Usage: setmd.py <file> <attribute> <type> <value> <value> ...
<file>: path to file
<attribute>: metadata attribute to set, e.g. kMDItemWhereFroms
<type>: type of value to set, e.g. string or array; must match the type expected by the attribute (e.g. str or list)
<value>: value(s) to set attribute to
For example: setmd.py /tmp/test.txt kMDItemWhereFroms array http://example.com
For metadata attributes and types, see https://developer.apple.com/documentation/coreservices/file_metadata/mditem/common_metadata_attribute_keys?language=objc
"""
# super simple argument parsing just for demo purposes
if len(sys.argv) < 5:
print(main.__doc__)
sys.exit(1)
file = sys.argv[1]
attribute = sys.argv[2]
type_ = sys.argv[3]
values = sys.argv[4:]
if type_ == "string":
values = values[0]
try:
attribute = getattr(CoreServices, attribute)
except AttributeError:
print(f"Invalid attribute: {attribute}")
sys.exit(1)
if not set_file_metadata(file, attribute, values):
print(f"Failed to set metadata attribute {attribute} on {file}")
sys.exit(1)
else:
print(f"Successfully set metadata attribute {attribute} on {file} to {values}")
if __name__ == "__main__":
main()
It doesn't yet handle types other than string or array (need to reference here for full list of attributes/types) -- kMDItemWhereFroms is an array. Finder comments and Finder tags cannot be set this way. Finder comments must be set by AppleScript and Finder tags by xattr using com.apple.metadata:_kMDItemUserTags
.
I verified that both mdls and Finder show the updated kMDItemWhereFroms when set this way.
More to come -- will look at adapting this for osxmetadata.
This is incredible, thank you! I was thinking that I should try using MDItemSetAttribute
as used in the browser code above, but hadn't gotten around to it since I thought it would have to be in ObjC or Swift. Kudos for working it out, and so quickly!
Glad it's useful! Check out the gist where I've updated the code to handle all the different types that MDItems can have. I plan to rewrite osxmetadata to use MDItemSetAttribute
and MDItemCopyAttribute
wherever possible but have some other projects on the front burner at the moment.
@all-contributors add @nk9 for bug
@RhetTbull
I've put up a pull request to add @nk9! :tada:
@nk9 I've release version 1.0.0 of osxmetadata that fixes this bug and several others. It's a complete rewrite to use the native macOS calls to get/set metadata. It does change the API in breaking ways though so check out the README.md.
So Rhet, I am kind of in awe how much work you've done over the past two weeks on this. I'm just glad I could be the inspiration for the flurry of activity on this project! And thanks for updating the docs too.
However, I have some bad news… I'm still seeing the same behavior. 😬
🕙 15:29:07 ❯ jq '.default.osxmetadata' Pipfile.lock
{
"hashes": [
"sha256:4883539ae64d557f1a25b1b7ac7b6e30e735b9853bb3233913d206d323ac4cf9",
"sha256:9adde4c63e727260d26a4917b0ff5388336f295372e3218e2759d526c3aedbbb"
],
"index": "pypi",
"version": "==1.0.0"
}
from osxmetadata import *
url = "https://apple.com"
out_path = "/tmp/test_md.txt"
with open(out_path, "w") as f:
f.write("hi")
meta = OSXMetaData(out_path)
meta.kMDItemWhereFroms = [url]
🕙 15:25:35 ❯ mdls /tmp/test_md.txt
kMDItemFSContentChangeDate = 2022-10-08 14:23:00 +0000
kMDItemFSCreationDate = 2022-10-08 14:23:00 +0000
kMDItemFSCreatorCode = ""
kMDItemFSFinderFlags = 0
kMDItemFSHasCustomIcon = 0
kMDItemFSInvisible = 0
kMDItemFSIsExtensionHidden = 0
kMDItemFSIsStationery = 0
kMDItemFSLabel = 0
kMDItemFSName = "test_md.txt"
kMDItemFSNodeCount = 2
kMDItemFSOwnerGroupID = 0
kMDItemFSOwnerUserID = 501
kMDItemFSSize = 2
kMDItemFSTypeCode = ""
🕙 15:23:00 ❯ xattr -l /tmp/test_md.txt
com.apple.metadata:kMDItemWhereFroms:
0000 62 70 6C 69 73 74 30 30 A1 01 5F 10 11 68 74 74 bplist00.._..htt
0010 70 73 3A 2F 2F 61 70 70 6C 65 2E 63 6F 6D 08 0A ps://apple.com..
0020 00 00 00 00 00 00 01 01 00 00 00 00 00 00 00 02 ................
0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 1E ................
Am I doing something wrong?
Strange -- looks like you're doing everything right. This does work in my testing.
What version of macOS are you using? I'm on Catalina still so perhaps it's an issue with newer versions of macOS?
> touch test_url.txt
> python
Python 3.10.5 (main, Jul 17 2022, 07:22:36) [Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from osxmetadata import *
md>>> md = OSXMetaData("test_url.txt")
>>> md.kMDItemWhereFroms = ["apple.com"]
>>> md.kMDItemWhereFroms
['apple.com']
> mdls test_url.txt
_kMDItemDisplayNameWithExtensions = "test_url.txt"
kMDItemContentCreationDate = 2022-10-08 14:54:38 +0000
kMDItemContentModificationDate = 2022-10-08 14:54:38 +0000
kMDItemContentType = "public.plain-text"
kMDItemContentTypeTree = (
"public.plain-text",
"public.text",
"public.data",
"public.item",
"public.content"
)
...
kMDItemWhereFroms = (
"apple.com"
)
> xattr -l test_url.txt
com.apple.metadata:kMDItemWhereFroms:
00000000 62 70 6C 69 73 74 30 30 A1 01 59 61 70 70 6C 65 |bplist00..Yapple|
00000010 2E 63 6F 6D 08 0A 00 00 00 00 00 00 01 01 00 00 |.com............|
00000020 00 00 00 00 00 02 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 00 00 00 00 14 |......|
00000036
@nk9 would you mind cloning the repo then running the test suite? See instructions in README_DEV.md for how to install/build the package.
I'm on Monterey 12.6. Hopefully it's not an OS version issue… but I'll give the tests a run and report back.
The README_DEV.md
instructions could benefit from a notice that you have to use poetry shell
before doit test
will work. 😄
But I got the tests running, and fortunately, they nearly all work! But there are three failures:
$ doit test
. test
TaskFailed - taskid:test
Command failed: 'poetry run pytest --doctest-glob=README.md tests/' returned 1
########################################
test <stdout>:
============================= test session starts ==============================
platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0
rootdir: /Users/nick/Projects/osxmetadata
collected 557 items
tests/test_cli.py .......F.......F.F [ 3%]
tests/test_datetime_handling.py .. [ 3%]
tests/test_datetime_utils.py .......... [ 5%]
tests/test_finder_info.py .... [ 6%]
tests/test_finder_tags.py .. [ 6%]
tests/test_findercomment.py ... [ 7%]
tests/test_mditem_attributes.py ........................................ [ 14%]
........................................................................ [ 27%]
........................................................................ [ 40%]
........................................................................ [ 52%]
........................................................................ [ 65%]
............................................................... [ 77%]
tests/test_nsurl_attributes.py ......................................... [ 84%]
........................................................................ [ 97%]
... [ 98%]
tests/test_osxmetada_asdict.py ... [ 98%]
tests/test_osxmetada_path.py . [ 98%]
tests/test_osxmetadata_exceptions.py ...... [ 99%]
tests/test_xattr.py . [100%]
=================================== FAILURES ===================================
_______________________________ test_cli_remove ________________________________
test_file = <tempfile._TemporaryFileWrapper object at 0x10a6cda50>
def test_cli_remove(test_file):
"""Test --remove"""
md = OSXMetaData(test_file.name)
md.authors = ["John Doe", "Jane Doe"]
md.tags = [Tag("test", 0)]
runner = CliRunner()
result = runner.invoke(
cli,
[
"--remove",
"authors",
"John Doe",
"--remove",
"tags",
"test,0",
test_file.name,
],
)
snooze()
assert result.exit_code == 0
md = OSXMetaData(test_file.name)
> assert md.authors == ["Jane Doe"]
E AssertionError: assert ['John Doe', 'Jane Doe'] == ['Jane Doe']
E At index 0 diff: 'John Doe' != 'Jane Doe'
E Left contains one more item: 'Jane Doe'
E Use -v to get more diff
tests/test_cli.py:195: AssertionError
___________________________ test_cli_backup_restore ____________________________
test_dir = '/Users/nick/Projects/osxmetadata/tmp_ma9znhab'
def test_cli_backup_restore(test_dir):
"""Test --backup and --restore"""
dirname = pathlib.Path(test_dir)
test_file = dirname / "test_file.txt"
test_file.touch()
md = OSXMetaData(test_file)
md.tags = [Tag("test", 0)]
md.authors = ["John Doe", "Jane Doe"]
md.wherefroms = ["http://www.apple.com"]
md.downloadeddate = [datetime.datetime(2019, 1, 1, 0, 0, 0)]
md.stationerypad = True
runner = CliRunner()
result = runner.invoke(cli, ["--backup", test_file.as_posix()])
assert result.exit_code == 0
# test the backup file was written and is readable
backup_file = dirname / BACKUP_FILENAME
assert backup_file.is_file()
backup_data = load_backup_file(backup_file)
assert backup_data[test_file.name]["stationerypad"] == True
# wipe the data
result = runner.invoke(cli, ["--wipe", test_file.as_posix()])
snooze()
md = OSXMetaData(test_file)
assert not md.tags
> assert not md.authors
E AssertionError: assert not ['John Doe', 'Jane Doe']
E + where ['John Doe', 'Jane Doe'] = <osxmetadata.osxmetadata.OSXMetaData object at 0x10a753850>.authors
tests/test_cli.py:411: AssertionError
________________________________ test_cli_order ________________________________
test_dir = '/Users/nick/Projects/osxmetadata/tmp_a3hhb18m'
def test_cli_order(test_dir):
"""Test order CLI options are executed
Order of execution should be:
restore, wipe, copyfrom, clear, set, append, remove, mirror, get, list, backup
"""
dirname = pathlib.Path(test_dir)
test_file = dirname / "test_file.txt"
test_file.touch()
test_file.write_text("test")
md = OSXMetaData(test_file)
md.tags = [Tag("test", 0)]
md.authors = ["John Doe", "Jane Doe"]
md.wherefroms = ["http://www.apple.com"]
md.downloadeddate = [datetime.datetime(2019, 1, 1, 0, 0, 0)]
md.findercomment = "Hello World"
runner = CliRunner()
# first, create backup file for --restore
runner.invoke(cli, ["--backup", test_file.as_posix()])
# wipe the data
runner.invoke(cli, ["--wipe", test_file.as_posix()])
snooze()
# restore the data and check order of operations
result = runner.invoke(
cli,
[
"--get",
"comment",
"--set",
"authors",
"John Smith",
"--restore",
"--set",
"title",
"Test Title",
"--clear",
"title",
"--append",
"tags",
"test2",
"--set",
"comment",
"foo",
"--remove",
"authors",
"Jane Doe",
"--append",
"authors",
"Jane Smith",
"--mirror",
"comment",
"findercomment",
test_file.as_posix(),
],
)
output = parse_cli_output(result.output)
assert output["comment"] == "Hello World"
snooze()
md = OSXMetaData(test_file)
> assert md.authors == ["John Smith", "Jane Smith"]
E AssertionError: assert ['John Doe', 'Jane Doe'] == ['John Smith', 'Jane Smith']
E At index 0 diff: 'John Doe' != 'John Smith'
E Use -v to get more diff
tests/test_cli.py:517: AssertionError
=========================== short test summary info ============================
FAILED tests/test_cli.py::test_cli_remove - AssertionError: assert ['John Doe...
FAILED tests/test_cli.py::test_cli_backup_restore - AssertionError: assert no...
FAILED tests/test_cli.py::test_cli_order - AssertionError: assert ['John Doe'...
======================== 3 failed, 554 passed in 14.14s ========================
Let me know if I can do anything else to help you narrow this down!
The README_DEV.md instructions could benefit from a notice that you have to use poetry shell before doit test will work.
Good point! I'll do so. I use the zsh-poetry plugin which activates/deactivates poetry shells automatically so I always forget that poetry shell
is a thing.
Glad to see most of the tests are running. Interesting that all three failures appear to be with kMDItemAuthors
. However, this doesn't shed light on the issue you are encountering with kMDItemWhereFroms
. The test suite specifically tests writing and reading back all writable attributes so it appears your code should successful set kMDItemWhereFroms
(as the fact that the xattr was set successfully indicates). I'll need to think some more about why the change isn't showing in Finder and mdls.
I could add a test that also checks the output of mdls
after writing the attribute (and I've got an mdls parser written for another project) but I've noticed that it can take some time before the data is re-indexed and appears in mdls
so this would be hard to incorporate in a test suite.
I just noticed something....you were writing your test file to /tmp
. I've noticed that metadata doesn't "stick" in /tmp' or
/private/var/tmpas Spotlight doesn't index them. For that reason, the test methods in
/testsuse a custom fixture to create (and cleanup) all necessary temp files in the current directory where the tests are run, not in
/private/var/tmp` as would be done with the usual temp file methods.
Try with a file that's not in a temporary directory and let me know if you get different results.
Whoa, what a bizarre quirk! Indeed, when I write something to my home directory instead, the Where froms are set as expected and are shown immediately in Finder. Still doesn't explain the test errors…
But maybe this can be closed after all? Probably a good idea to document this /tmp
quirk as well. Thanks so much!
I'll add a note to the docs about temporary files. I got the test suite running last night in GitHub actions (via a BigSur VM, the latest available in GitHub). Interestingly the same three tests fail with the same result. Something about kMDItemAuthors
isn't right for macOS > Catalina. I'll open a separate issue for this.
tests/test_cli.py:517: AssertionError
=========================== short test summary info ============================
FAILED tests/test_cli.py::test_cli_remove - AssertionError: assert ['John Doe...
FAILED tests/test_cli.py::test_cli_backup_restore - AssertionError: assert no...
FAILED tests/test_cli.py::test_cli_order - AssertionError: assert ['John Doe'...
=================== 3 failed, 551 passed, 3 skipped in 8.14s ===================
I've opened a new issue (#68) for the kMDItemAuthors
fails and added a section to the README.md regarding temporary files. If you have any other recommendations to make the README more useful, feel free to open an issue or send a PR.
This doesn't cause the URL to show in Finder:
The attribute also is not listed with
mdls
:But it IS there, and looks to be correctly formatted as a binary plist:
I'm on Monterey 12.6 (21G115)