Closed blu3mania closed 4 years ago
Completely getting rid of recentlyClosed sounds like a good idea. I got carried away when implementing the project in the early stage of development. That and the vicious tolerance of js resulted in the bloated program now. In short, I'm for taking things away rather than adding more to this mess, especially with the memory leak issue (maybe, maybe not) still lurking around.
k, I will remove it and do some testing. Will let you know when it's ready.
It seems editing comment doesn't trigger email... So I am making a new post. Please see first post ([Edit] part) for changes. Thanks.
As for the memory leak, I don't think I have seen it, but again, I don't run mine all the time, since I need the network bandwidth during the day. This weekend I could try to run it for more than a day and observe memory usage. One thing I think could be a factor is fixed rooms. Even when a fixed room goes inactive for more than 3 days (or configured value), it is not removed from the controller and the only way to get rid of such rooms is to restart the app. Though, I doubt they will be consuming lots of memory since they are essentially zombies. They do waste precious connections so it would be good to get rid of them dynamically. Maybe getting update from database when rooms are removed?
Thanks, looks good to me.
What I'm considering is, having a sync(db: Database)
in FixedController to filter out expired connections that were removed from the db. This method can be invoked along with the dynamicRefreshTask from App. Tearing those down is a legit idea, or otherwise they stay around indefinitely.
As for the memory usage, I do observe a slow but steady increase as the program spins. It seems to ease out at around 500M and doesn't continue to grow after that point. I don't think I'm keeping anything that would cost extra memory alloc and retention (other than fixed rooms, that is).
It's time to move on from this project. bilibili
just limited # of connections from a single ip. There are no viable options for monitoring raffles from this point on.
Oh, I saw the connection issues this morning but didn't know the cause. What did they limit it to?
My hypothesis is around 20. Going beyond would be risky. Dynamic and Fixed rooms need to be abandoned, so commenting out this part:
I see. So essentially we can only run the area based raffle monitor. That's sad...
It is indeed
Oh, you will also want to disable the newly added block that adds rooms to dynamic: https://github.com/Billyzou0741326/bilibili-live-monitor-ts/blob/3751b491b2e7617fff8c82378d913b434e8b75c2/src/app/App.ts#L110-L114
nvm, just saw your new commit.
Thanks for reminding, I overlooked that part before seeing your comment
Don't give up yet! I have an idea. We can utilize existing dynamic room query and roomid-handler. Let's assume we can only use 20 connections, 6 will be used for areas, which will give us good coverage of gifts/storm, but not guard/anchor/pk. Now, we still have 14 connections. What we can do is to query all dynamic rooms, then use the 14 worker to work through the queue, claiming one room at a time, use roomid-handler to check gift/guard/pk (and potentially storm/anchor), once done, close the connection and process next in queue until done. Then rinse and repeat.
Obviously this could cause duplicate braodcasts, which we can use the same idea of the recent room cache I introduced in the client code (but that could slow down the server), or just enable it in client side for single-server case (right now it is only enabled for multi-server setup).
I am busy during the day but I can try the idea this weekend, unless you want to give it a try.
To clarify, we are turning to http requests then?
Actually you are right, if we do it this way (http request), we are not even subject to the socket connection limit... Any down side?
Only downside is that this is super slow. 50 requests per second is the limit I set to prevent http 412 precondition failure. It will take us 200 seconds to query 10000 rooms, assuming no http failures
50 requests / second was the cap for Connection: 'close'
. It might be higher for Connection: 'keep-alive'
, but I haven't run any test cases against that.
I see... Guards are not very time sensitive, as the expiry is 20 mins/2 hours/??? for the 3 tiers. If we only do it to rooms saved in database, that should give enough time to catch all guards. We could also query the top N (3000?) live rooms based on "online", and add them to the mix.
Sounds like a plan.
Running at 50 requests per second is now giving me the 412 status (that brought down two servers). They seem to have lowered the bar once again. Further testing shows that 30, 35, 40 works fine.
They certainly did not make our lives easy... I will try to get it coded this weekend.
I rolled out part of the implementation, it's sitting in the dev branch. In case I don't have the time to finish it, you can start from there. I appreciate the idea and the help.
oh, I din't see your post and I made my own POC, which I pushed: https://github.com/blu3mania/bilibili-live-monitor-ts/commit/79e36fcd64f30ed55f51a0e54d30a575f5c6f826
I took a brief look at yours and we took different approaches. One thing you can copy from my POC is the addition of storm and anchor handling in roomid-handler. Anchor is quite common when I tested, but storm is only encountered very occasionally...
And what a coincidence, we both named the new class RoomCrawler :)
A pleasant coincidence indeed. The design is largely similar too, it's amusing.
I'll take your anchor/storm handling and the emit('done')
strategy, which turned out to be much cleaner than mine.
One thing I am not sure in the room collector is the recent change of querying each areas on top of querying all. Current code uses live count from all areas to query each area, and lots of pages return empty. To test, I added areaid to getLiveCount and called it from getRoomsInArea:
public static getLiveCount(areaid: number = 0): Promise<number> {
const params: any = {
'parent_area_id': areaid,
const promise = Bilibili.getLiveCount(areaid).catch((error: any): Promise<number> => {
cprint(`Bilibili.getLiveCount - ${error.message}`, chalk.red);
return Promise.resolve(5000); // on error return 5000
}).then((room_count: number): Promise<any> => {
room_count = Math.min(count, room_count);
const PAGES: number = Math.ceil(room_count / page_size) + (count === Infinity ? 1 : 2); // If querying all rooms, add one page to query; otherwise add 2 pages due to potential duplicate rooms between pages
cprint(`Querying ${room_count} rooms in ${PAGES} pages in area ${areaid}`, chalk.gray);
The sum of counts from all areas is very close to the count from area 0, and I'd contribute the slight difference to the asynchronous nature of the queries. Did the current code actually find more rooms? If not, using real live count in each area, or only querying all rooms like in the past would reduce number of queries before crawling.
And lol:
[2020-04-17 23:55:59] 2363869534231 @ 176190 storm 节奏风暴
[2020-04-17 23:56:02] 2363868 @ 22104365 guard 舰长
[2020-04-17 23:56:06] 2363869534231 亿圆+1
[2020-04-17 23:56:06] Executed 87 times
[2020-04-18 00:31:48] 2363933872275 @ 21335815 storm 节奏风暴
[2020-04-18 00:31:51] 2363933872275 亿圆+1
[2020-04-18 00:31:51] Executed 20 times
[2020-04-18 00:37:14] 2363948884336 @ 21398753 storm 节奏风暴
[2020-04-18 00:37:16] 2363948884336 亿圆+1
[2020-04-18 00:37:16] Executed 7 times
The new limit really has a big effect on storm :D
Damn, that's hilarious
[Edit] As per comment from @Billyzou0741326 below, the handling of recently closed rooms are removed. Other refactoring are still applicable, including the new start() method, which for now is just empty but good for consistency.
Assuming the reason why recently closed rooms are tracked is that dynamic room query could return rooms that are just closed, using array size to control rooms being tracked could have 2 issues:
This it is now changed to be time based. A closed room will have an expiry (for now, 150 seconds) in the tracking map, and will not be tracked anymore after that (cleanup interval is set as 60 seconds). I am not sure if 150 seconds is a good value. It is chosen only to make sure if a room comes back in the next dynamic room query it will be skipped. You may know a better value as it really depends on the original issue this tracking mechanism addresses.
There are also some refactoring in room controllers: