darkdragn / party

A quick *.party downloader
84 stars 12 forks source link

cleanup/slugify of schema fields #24

Closed HornyQT closed 7 months ago

HornyQT commented 8 months ago

There are certain characters, such as "/", "\" and probably many more, that should be removed from the schema fields. This is because they can cause unexpected behaviour, such as creating a new folder.

This may also allow the schema fields to be compatible with the Windows file system, which does not allow certain characters for folder and file names, a similar problem to #22.

darkdragn commented 8 months ago

I'll look into it. It looks like the base sluglify package was abandoned, but python-sluglify is active, and the only fields that would need it would be post_title, name and filename, I'm thinking.

HornyQT commented 8 months ago

For the fields it would probably be good to check all fields that can be used by the"--file-format" option.

darkdragn commented 8 months ago

For the fields it would probably be good to check all fields that can be used by the"--file-format" option.

Currently I'm not restricting it. Anything in the Attachment class is open for use. So, [filename, name, path, post_id, post_title, base_name, extension] are all valid. I've been playing with it, and have it working on my dev setup, but I'm going to add it as an optional flag so people with existing folders don't wind up with a bunch of duplicate files.

I'm just messing with using it against filename without borking the extension, which happened in an early test.

darkdragn commented 8 months ago

I have a branch up for it, if you want to give it a test drive. Run it through the ringer and if it feels fine I'll merge it over once I add the switch to make sure others don't double their disk usage.

So far it's a pretty straight forward change: 4b303c83e05458ff1027db0006f49dbec5ee7c1e

darkdragn commented 8 months ago

Here's a quick preview, just got the extra switch working. A little tricky considering how removed posts is from the CLI. And I really need to re-work the cli with common options to reduce verbosity.

(party-py3.11) (base) darkdragn@DESKTOP-M35OH2B:/mnt/d/src/party$ party kemono patreon kajin --file-format "{ref.post_id}_{ref.post_title}_{ref.index}.{ref.extension}" -d Kajin --limit 5 -d temp
2023-11-05 15:40:51.030 | DEBUG    | party.cli:pull_user:112 - Excluded Extensions: []
⠹ User found: kajin; parsing posts...Duplicate files found, recommend using post_id
Downloading from user: kajin
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:16<00:00,  1.35s/it]
2023-11-05 15:41:11.034 | INFO     | party.cli:pull_user:217 - Output status: Counter({<StatusEnum.SUCCESS: 1>: 12})
(party-py3.11) (base) darkdragn@DESKTOP-M35OH2B:/mnt/d/src/party$ jq '.' temp/.info
{
  "user": {
    "directory": "temp",
    "id": "585637",
    "indexed": "Sun, 23 Aug 2020 10:03:20 ",
    "name": "kajin",
    "service": "patreon",
    "site": "https://kemono.party",
    "updated": "Fri, 03 Nov 2023 20:12:05 ",
    "url": "https://kemono.party/api/v1/patreon/user/585637"
  },
  "options": {
    "exclude_extensions": [],
    "files": true,
    "exclude_external": true,
    "base_url": "https://kemono.party",
    "directory": "temp",
    "ordered_short": false,
    "file_format": "{ref.post_id}_{ref.post_title}_{ref.index}.{ref.extension}",
    "sluglify": false
  }
}
(party-py3.11) (base) darkdragn@DESKTOP-M35OH2B:/mnt/d/src/party$ party kemono patreon kajin --file-format "{ref.post_id}_{ref.post_title}_{ref.index}.{ref.extension}" -d Kajin --limit 5 -d temp --sluglify
2023-11-05 15:41:25.620 | DEBUG    | party.cli:pull_user:112 - Excluded Extensions: []
⠇ User found: kajin; parsing posts...Duplicate files found, recommend using post_id
Downloading from user: kajin
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:14<00:00,  1.22s/it]
2023-11-05 15:41:43.372 | INFO     | party.cli:pull_user:217 - Output status: Counter({<StatusEnum.SUCCESS: 1>: 12})
(party-py3.11) (base) darkdragn@DESKTOP-M35OH2B:/mnt/d/src/party$ jq '.' temp/.info
{
  "user": {
    "directory": "temp",
    "id": "585637",
    "indexed": "Sun, 23 Aug 2020 10:03:20 ",
    "name": "kajin",
    "service": "patreon",
    "site": "https://kemono.party",
    "updated": "Fri, 03 Nov 2023 20:12:05 ",
    "url": "https://kemono.party/api/v1/patreon/user/585637"
  },
  "options": {
    "exclude_extensions": [],
    "files": true,
    "exclude_external": true,
    "base_url": "https://kemono.party",
    "directory": "temp",
    "ordered_short": false,
    "file_format": "{ref.post_id}_{ref.post_title}_{ref.index}.{ref.extension}",
    "sluglify": true
  }
}
(party-py3.11) (base) darkdragn@DESKTOP-M35OH2B:/mnt/d/src/party$ ls temp
'91889193_SummerLadydevimon or SummerAngewomon?_0.jpg'  '92026701_October last hours!_2.jpg'
'91889193_SummerLadydevimon or SummerAngewomon?_1.jpg'  '92026701_October last hours!_3.jpg'
'91889193_SummerLadydevimon or SummerAngewomon?_2.jpg'  '92026701_October last hours!_4.jpg'
 91889193_summerladydevimon-or-summerangewomon_0.jpg     92026701_october-last-hours_0.jpg
 91889193_summerladydevimon-or-summerangewomon_1.jpg     92026701_october-last-hours_1.jpg
 91889193_summerladydevimon-or-summerangewomon_2.jpg     92026701_october-last-hours_2.jpg
'91968217_October last days!_0.jpg'                      92026701_october-last-hours_3.jpg
'91968217_October last days!_1.jpg'                      92026701_october-last-hours_4.jpg
 91968217_october-last-days_0.jpg                       '92239087_My halloween cosplay_0.jpg'
 91968217_october-last-days_1.jpg                       '92239087_My halloween cosplay_1.jpg'
'92026701_October last hours!_0.jpg'                     92239087_my-halloween-cosplay_0.jpg
'92026701_October last hours!_1.jpg'                     92239087_my-halloween-cosplay_1.jpg
darkdragn commented 7 months ago

Merged in v0.6.7