Nandaka / PixivUtil2

Download images from Pixiv and more!
http://nandaka.devnull.zone/
BSD 2-Clause "Simplified" License
2.4k stars 254 forks source link

Fanbox: Filename is too long, image is saved into pixivutil directory #525

Closed NHOrus closed 4 years ago

NHOrus commented 5 years ago

Prerequisites

Description

On linux, trying to download work with

filenameformat = %member_id%/%image_id% - %title%
filenamemangaformat = %member_id%/%urlFilename% - %title%
filenameinfoformat = %member_id%/%image_id% - %title%

Files from fanbox are saved into pixivutil folder instead of artist folder

Artist put excessively long title:

Start downloading... Using Referer: https://www.pixiv.net/fanbox/creator/39182623/post/446430
Error at download_image(): Cannot save https://fanbox.pixiv.net/images/post/446430/O0x7ASU437evHkT61U7YsVW5.png to /home/nho/adata/pixiv/39182623/446430_p3_O0x7ASU437evHkT61U7YsVW5 - 4枚に+12枚=16カット。(カバーでは起きてますが行為中は起きません:ボテ絵)指で局部広げ・挿入に3カット・抽挿に4カット・射精に3カット・ペニス引き抜き溢れ精液に3カット・ボテ1カット になります。(+下書き一枚を挟んで文字なしver.).png: (<type 'exceptions.IOError'>, IOError(36, 'File name too long'), <traceback object at 0x7fe9e9f587a0>)
File is saved to O0x7ASU437evHkT61U7YsVW5.png

I expect that file gets saved into correct folder, possibly without title or with trunkated filename.

Versions

Current git, reported as 20190907b

photonometric commented 5 years ago

Ah yeah, the attempted filename/path length came out to 403 characters in that, where the limit is 255 on all common modern filesystems.

Same thing happened on a booru downloader I used to follow, if someone used a TAG variable and there were like 20-30 tags for the image. They implemented a hard character limit on all filenames.

So probably something like 250 (for buffer) - <root dir length> - <file ext> = max output length of the 3 filename format variables would solve this, as well as others that could come up without using %title%.

NHOrus commented 5 years ago

Four filename extension characters and one for separator.10 окт. 2019 г. 3:05 ПП пользователь photon notifications@github.com написал:Ah yeah, the attempted filename/path length came out to 403 characters in that, where the limit is 255 on all common modern filesystems. Same thing happened on a booru downloader I used to follow, if someone used a TAG variable and there were like 20-30 tags for the image. They implemented a hard character limit on all filenames. So probably something like 250 (for buffer) - = max output length of the 3 filename format variables would solve this, as well as others that could come up without using %title%.

—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or unsubscribe.

photonometric commented 5 years ago

Four filename extension characters and one for separator

Sure, meant the whole path+filename+ext as the "output" in my example. Not sure what you mean by 1 char for separator; filenames might have different numbers of separations (e.g. space) depending on the number of format variables, and in the case of %tags% would depend on the number of tags in the image....so I assume that is already done on the fly to some extent and a hard trancate to x if filename length > 250 or IOError36 would be the easiest sort of thing to do. I doubt it comes up often enough to make it worth doing conditionals on what tags to include if errored.

But of course Nandaka will know the semantic details of filename/variable interaction much better than me x3 I just was giving this a bump with basic thoughts because I was testing a related filename error ^^

NHOrus commented 5 years ago

Ah, I misunderstood. Either way, there need to cut "The templader name" - path -.jpeg On linux NAME_MAX is 255, PATH_MAX is 4096 On Windows, it's 260 characters

Except it's single byte on Linux and Unicode symbol on Windows, so it takes a bit more bits for same name in Linux than in Windows.

It's a mess, honestly.

Nandaka commented 5 years ago

It is already cut the filename to 255 in https://github.com/Nandaka/PixivUtil2/blob/9db6153e624e76143b188a83685b6321a23b5327/PixivHelper.py#L113

/home/nho/adata/pixiv/39182623/446430_p3_O0x7ASU437evHkT61U7YsVW5 - 4枚に+12枚=16カット。(カバーでは起きてますが行為中は起きません:ボテ絵)指で局部広げ・挿入に3カット・抽挿に4カット・射精に3カット・ペニス引き抜き溢れ精液に3カット・ボテ1カット になります。(+下書き一枚を挟んで文字なしver.).png

Should be counted as 193 chars, right? unless in linux, it is counted as double width chars for the kanji/kana.

Related call https://github.com/Nandaka/PixivUtil2/blob/master/PixivUtil2.py#L1906 https://github.com/Nandaka/PixivUtil2/blob/9db6153e624e76143b188a83685b6321a23b5327/PixivHelper.py#L71

split-n commented 4 years ago

FYI:

For Windows, usually, 255 CHARS is maximum for FULLPATH. (There's way to expand limitation but not commonly used https://docs.python.org/3/using/windows.html#removing-the-max-path-limitation )

For Linux, 255 BYTES is the maximum for FILENAME. (Usually, UTF-8 is used for encoding, most Japanese chars are 3bytes but there are exceptions.) So if a directory path is long, Linux may be able to save longer FILENAME compared to Windows.

Nandaka commented 4 years ago

So if a directory path is long, Linux may be able to save longer FILENAME compared to Windows.

shouldn't be the other way around if linux limitation is based on bytes? e.g. assuming worst case scenario (3bytes per character), then the max filename character will be 255/3, isn't it?

split-n commented 4 years ago

In worst case, there's char that represented by 4bytes but rarely. (a few kanjis like 𠮷 and emojis ☺) .

I come up with this code (didn't tested yet).

    if platform.system() == 'Linux':
        # Linux: cut filename <= 255 bytes
        dirname, basename = os.path.split(name)
        while len(basename.encode('utf-8')) > 255:
            filename, extname = os.path.splitext(basename)
            filename[:len(filename) - 1]
            basename = filename + extname

        name = dirname + os.sep + basename
    else:
        # cut path to 255 char
        if len(name) > 255:
            newLen = 250
            name = name[:newLen]