Open mrkranston opened 2 weeks ago
wow… I didn’t know that error existed. I feel like I should warn you: as far as I can tell, 1000 posts seem to correspond to something around 1 GB of downloaded size, meaning your download would have a size on the order of 100 GB.
I am seeing several resolutions:
liked.json
file inside the out directory and put []
in itOption 1 or 2 sounds good :3 maybe something else to consider for option 1 would be some kind of rate limit for large downloads such as this. I dont want to crash any servers by making requests like "give me 100,000 posts ASAP please". Perhaps option 2 is more feasible, but i will leave that up to your judgement
Alright, I’ve decided to just improve the file format that likes are stored in. It should theoretically work now with the latest commit (63d1070)… I can’t test this because I don’t have that many posts. I hope it doesn’t crash afterwards?
You can also slow down your requests with a new REQUEST_DELAY_SECS option.
Note: I am sort of expecting something to crash for sure if you reach “generating index for all posts,” because that’s going to try and create a full-text search index of every post you’ve downloaded, which would probably be like 1 GB in size if it succeeds at all, and kill every browser you open it in. If it crashes before that though, then I will consider that a fixable problem
I'll let you know how it works out when i get home :3 thank you for the fast response (and for making the tool in the first place)
Ran it overnight, and it managed to generate a ~1 gb file full of the liked posts. It ran out of memory in certain spots after that, but i was able to just restart it and it picked up where it left off. After a few tries it's clear that there's one particular point where it runs out of memory, but i'm not entirely sure if it's at the point it starts making the full-text search:
~~ cohost source version 3c7903d6
compiling Javascript: post-index compiling Javascript: post-page
<--- Last few GCs --->
[16320:000001FE32A64000] 71201 ms: Mark-Compact (reduce) 1360.1 (1390.8) -> 1347.3 (1350.6) MB, pooled: 0 MB, 14.61 / 0.00 ms (+ 118.4 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 143 ms) ... cohostdlerror3.txt
No, it’s definitely not supposed to be crashing there. The simplest solution to that would be to just give it more memory, I think? When doing deno run ...
, add deno run --v8-flags=--max-old-space-size=8192 ...
for an 8 GB RAM limit, for example
it seems to have gotten to the point where it's downloading posts, which is a good sign i think! we'll have to see if 12GB of memory is enough to get it to compile the index. i'll let you know how it turns out when it is finished
good luck downloading (checks notes) 2% of cohost.org! lol
it might take… a few days
When i tried to run cohost-dl, i eventually got this error:
GET https://cohost.org/rc/liked-posts?refTimestamp=1727056241374&skipPosts=145240 GET https://cohost.org/rc/liked-posts?refTimestamp=1727056241374&skipPosts=145260 ←[0m←[1m←[31merror←[0m: Uncaught (in promise) RangeError: Invalid string length await ctx.write("liked.json", JSON.stringify(liked)); ←[0m←[31m ^←[0m at ←[0m←[1m←[3mJSON.stringify←[0m (←[0m←[36m<anonymous>←[0m) at ←[0m←[36mfile:///F:/downloads/cohost-dl-main/main.ts←[0m:←[0m←[33m48←[0m:←[0m←[33m44←[0m at ←[0m←[1m←[3meventLoopTick←[0m (←[0m←[36mext:core/01_core.js←[0m:←[0m←[33m175←[0m:←[0m←[33m7←[0m) PS F:\downloads\cohost-dl-main>
I'm inclined to believe that i do actually have ~140k liked posts, since i just loaded one of the urls it printed, and it shows some posts from 2022. I'm not sure if it matters for this particular case, but i am using windows for this.