AlphaSlayer1964 / kemono-dl

A simple kemono.party downloader using python.
504 stars 81 forks source link

"Merging" html with json and links file #87

Closed sybarix closed 2 years ago

sybarix commented 2 years ago

Description

When --content is used, the resulting html references files as weblinks. However, the json and links file seem to contain the relevant data needed for the html file to use locally downloaded resources (I say this because the json file seems to contain the local file paths which can be used to replace the weblinks, but please correct me if I'm wrong since I'm really not familiar in this field). Is there a way to "merge" the html, json, and links files so they display as a single continuous file, or is there some kind of file browser that can do this?

Additionally, it would also be great if there was an option to merge all the posts by a certain creator into a single html file so there would only be a single creator.html containing all the posts instead of multiple posts.html.

Thanks as always.

AlphaSlayer1964 commented 2 years ago

When --content is used, the resulting html references files as weblinks.

Yes unless the --inline option is used the kemono.party hosted images in the post content are not downloaded and can't be viewed in the content html file. I also only download inline images that are hosted on kemono.party. Sometimes they are not.

However, the json and links file seem to contain the relevant data needed for the html file to use locally downloaded resources

The json file does contain all local file paths but inline images in the post content already are changed to the local path if they are downloaded. Post attachments are not displayed in the post content. Also the links file is just a list of all the href links in the content so that data is already in the content file. Maybe I'm misunderstanding but if you could give references to what data you are referring to that would be helpful.

Additionally, it would also be great if there was an option to merge all the posts by a certain creator into a single html file so there would only be a single creator.html containing all the posts instead of multiple posts.html.

This has been brought up before. Right now I'm not really focusing on it because two reasons one. One I don't know html formatting well. If you look at the content html it's kind of just thrown in there. Two since I don't know html well I don't have a good method for appending post content in order correctly. So this feature may come eventually.

sybarix commented 2 years ago

Yes unless the --inline option is used the kemono.party hosted images in the post content are not downloaded and can't be viewed in the content html file. I also only download inline images that are hosted on kemono.party. Sometimes they are not.

The json file does contain all local file paths but inline images in the post content already are changed to the local path if they are downloaded. Post attachments are not displayed in the post content. Also the links file is just a list of all the href links in the content so that data is already in the content file. Maybe I'm misunderstanding but if you could give references to what data you are referring to that would be helpful.

I see, then using this post https://kemono.party/fanbox/user/10125263/post/3696077 as an example, running this command


@echo off
py kemono-dl.py --cookies "kemono.party_cookies.txt,coomer.party_cookies.txt" --skip-filetypes PSD --inline --json --extract-links --yt-dlp --links https://kemono.party/fanbox/user/10125263/post/3696077 --retry 3 --date-strf-pattern "%%y%%m%%d" --dms --content --dirname-pattern "Downloads\{username} [{user_id}] {service}" --filename-pattern "({published}) {title} {index} - {filename} [{id}].{ext}" --other-filename-pattern "({published}) {title} - {filename} [{id}].{ext}"
pause

produces this folder: gin00 [10125263] fanbox.zip

However, the resultant html only contains fanbox weblinks while the kemonoparty page does not have any fanbox links. Thus I'm a little confused about how it works.

This has been brought up before. Right now I'm not really focusing on it because two reasons one. One I don't know html formatting well. If you look at the content html it's kind of just thrown in there. Two since I don't know html well I don't have a good method for appending post content in order correctly. So this feature may come eventually.

Alright, no worries thanks

AlphaSlayer1964 commented 2 years ago

So with that link in particular there seems to be some sort of bug where the kemono.party page is not showing the content. If you go to the api for that page page https://kemono.party/api/fanbox/user/10125263/post/3696077 you can see under "content" there is html. That is what gets written to the content file.

sybarix commented 2 years ago

Oh I see, thanks for clarifying. Cheers mate

AlphaSlayer1964 commented 2 years ago

I will add that to my list of broken pages so thanks for bringing it up.