biglocalnews / clean-scraper

Scraper library and CLI tool to harvest police bodycam footage and other files as part of the Community Law Enforcement Accountability Project (CLEAN)
https://biglocalnews.org/content/collaborations/clean.html
Apache License 2.0
7 stars 10 forks source link

cache.read_json and write_json not implementing relative path #70

Open stucka opened 2 months ago

stucka commented 2 months ago

cache.read_json and write_json are documented about wanting a filepath relative to the cache dir, but I don't think it's implemented that way.

Either the function documentation needs to get revised, or the existing scrapers that use it need to get patched up. Comparable functions do implement relative pathing, e.g.: this sets it up in some of the cache functions:

out = Path(self.path, name)

A scraper using absolute path is San Diego PD:

        metadata = self.cache.read_json(
            self.data_dir.joinpath(f"{self.agency_slug}.json")
        )

"A foolish consistency is the hobgoblin of little minds" -- Ralph Waldo Emerson

stucka commented 2 months ago

Something that wouldn't break existing things but might add flexibility but potentially cause confusion: We can tweak those sections to have a relative=False default, so anything already using them would be just fine. But passing along relative=True would bring behavior somewhat like the other routines. -That- said, none of the other routines have such a relative flag so it's inherently different from the expected (and incorrectly documented) behavior.

newsroomdev commented 2 months ago

@stucka I like that idea and have done similar work in the past with libraries that allow for additional parameters that are older code doesn't use yet. like you said, we'd just have to document it, but docstrings, types, and contributing.md is a great place to note this