4pr0n / ripme

Downloads albums in bulk
MIT License
912 stars 205 forks source link

DeviantArt: Rip Literature #525

Open metaprime opened 7 years ago

metaprime commented 7 years ago

Follow up from #496 (original request: https://github.com/4pr0n/ripme/issues/496#issuecomment-299345268)

rautamiekka commented 7 years ago

EDIT: Pending a largely complete rewrite after re-reading the source code and losing all the changes ...

1) The literature side makes quite extensive use of div tags with class="..." to mark things.

2) On each page, the story text itself is stored inside <div class="text">...</div> and the text is escaped with HTML, and likewise the newlines are HTML. That tag doesn't appear anywhere else. However, I'm seeing JavaScript at the end of the story before the tag is ended (code indentation by Firefox-integrated website tools):

<script type="text/javascript">
            if (!window.__meta_cache) {
                window.__meta_cache = [];
            }
            window.__meta_cache['XXXXX']=[]
</script>

I replaced the actual content with XXXXX just in case. It doesn't matter anyway since that code ain't needed there.

3) https://gist.github.com/rautamiekka/0b2e2aeb53a4f77fa20c5890c7b910b8

More info to come ...