Closed hbiyik closed 10 months ago
This is a known downside of the current implementation, this has been discussed in https://github.com/Tribler/tribler/issues/3615#issuecomment-651069322
@ichorid as you made this community, do you feel #3615 sufficiently captures this issue or do you want to keep this one open?
With recursive SQL queries, the select + sort time of 700K torrents takes about 25 seconds.
https://www.sqlite.org/lang_with.html
The problem is, collection node queries a big dataset, does pythonic recursive sorts (factorial penalty !), puts them in temp data structures (messes the ram) sorts it several times (cpu penalty), i mean this is wrong in so many dimensions, i am not even talking about ORM overhead.
Also data should be traversed over generators
I did a recursive SQL implementation of commits half a year ago. Indeed, the performance becomes radically better. ORM does not hurt the recursive SQL implementation, as it is just that - a single custom SQL query. Also, I got an alternative, pure SQLite->SQLite (SQLite ATTACH
based) channel torrent format in the works. However, because of the lack of manpower during the last half a year, I had to defer these changes.
Hopefully, I'll be able to finish the stuff for 7.6.
@hbiyik , the metadata DB cannot be used with unmodified Tribler source to re-commit the private channel w/o the corresponding private key, BTW.
Please dont close this issue, mark it as enhancement or something, but the issue https://github.com/Tribler/tribler/issues/3615 is discussing the difference between good and perfect.
This issue is about the difference between terrible and OK. Currently it is not usable at all and you dont even need to go 100K torrents at all to see this. I am working on a way to improve that, below some sql snippets to improve the recursive selection
23 seconds on indexed id_ column which is missing
WITH recursive Nodes(metadata_type, origin_id, id_, title, status, lvl) as (
SELECT ChannelNode.metadata_type, ChannelNode.origin_id, ChannelNode.id_, ChannelNode.title, ChannelNode.status, 0 FROM ChannelNode WHERE
ChannelNode.public_key = x'pk'
AND ChannelNode.metadata_type = 300
AND ChannelNode.status in (0,1,6)
union
SELECT ChannelNode.metadata_type, ChannelNode.origin_id, ChannelNode.id_, ChannelNode.title, ChannelNode.status, Nodes.lvl + 1 FROM ChannelNode, Nodes
WHERE ChannelNode.id_ = Nodes.origin_id
) SELECT * FROM Nodes ORDER BY lvl DESC;
Orphan Clean Up Recursive SQL
3 seconds
WITH RECURSIVE Nodes(id_) as (
SELECT ChannelNode.id_ FROM ChannelNode
WHERE
ChannelNode.metadata_type = 400
AND ChannelNode.public_key = x'pk'
UNION
SELECT ChannelNode.id_ FROM ChannelNode, Nodes
WHERE ChannelNode.origin_id = Nodes.id_
) DELETE FROM ChannelNode WHERE
ChannelNode.public_key = x'pk'
AND ChannelNode.id_ not in (select id_ from Nodes)
Of course before cleaning up, those data should be dumped to lz4 blobs as well.
Currently i am working on it and I may provide irregular feedback due to the fact that i am not very available in weekdays.
Unless it is something completely fool-proof and really simple, like 1-to-1 replacing of ORM queries with recursive SQL, this is going to go into 7.6. If you want it in 7.5.2, we expect to release it July 20.
btw @arvidn @hbiyik 3-5 November or 17-19 November we would like to organise "TriblerCon". Small-scale physical meeting in the countryside with high-end Github access. Gathering of you and the 12 Delft people working on Tribler for detailed alignment, nerd talk, :beer: and roadmap for coming year. My dream is to get VLC-founder and Kodi developers to attend future editions and merge it with our scientific workshop, https://dicg2020.github.io
With recursive SQL queries, the select + sort time of 700K torrents takes about 25 seconds.
Because of your high-quality work we would like to refund your airfare if you could possibly attend TriblerCon during these Corona times..
this is going to go into 7.6.
Looking forward to seeing "Recursive SQL" that reduces insert time. Note that the channel format will change at some point in the future. It needs to become easier to parse in real-time. Each MByte downloaded on disk is both parsed, inserted and displayed.
. That is not easy to realise, requires adding a logical delete field to magnet links and alignment of 4MByte or so Torrent block/piece (piece picking) with 4MByte or so metadata storage block with padding. In the even further future we will support storing HD preview pictures of the content inside the metadata swarm for additional GUI appeal. Channel owners can then optionally fill in this content. It enables Tribler to move towards https://trontv.com/ Thumbnail-based content navigation. (since Tribler 4.0 days we tried to get this working also)
@hbiyik , we will be really glad to see you at Delft in person! :+1:
@synctext it is a pleasure for me to join such an event and getting to know members closely as well. It seems like KLM is already OK to deliver me from Germany despite COVID crisis and also free lecture from Prof. Pouwelse is a nice addition (even though i lose tracking after 20th or 30th minute or so :) when he starts to talk about Ethics, Freedom and high moral grounds, i hope several beers can increase my mileage in this context so count me in!
@ichorid i am sure that replacing SQL queries wont help by itself, there will be blood for sure. All the database iteration should be synchronous with the rest of the operations with generator iterators to prevent ram torture. Thus i am working on a fork in my repo, when the i build the confidence that this works without any side affects or regression, we can discuss which branch to merge, meanwhile you can keep working on an actual solution with the attach addition and multiple DBs. You never know what you will get with the lifecycle management of a software anyways :).
But I will need some support especially i am not very confident about some legacy definitions and old database formats as well.
@synctext forgot to mention, I was also going to bring that metadata possibilty for each channel content and channel itself, like the one there is in kodi. So i am glad that it is somewhere in the back of your head.
I dont want to sidetrack but only critical design clue could be, providing and mediaid for the content where the metadata is maintained by a thirdparty. For further details i would prefer to discuss offline, for the reasons you might guess :)
btw @arvidn @hbiyik 3-5 November or 17-19 November we would like to organise "TriblerCon". Small-scale physical meeting in the countryside with high-end Github access. Gathering of you and the 12 Delft people working on Tribler for detailed alignment, nerd talk, beer and roadmap for coming year. My dream is to get VLC-founder and Kodi developers to attend future editions and merge it with our scientific workshop, https://dicg2020.github.io
count me in! (assuming the netherlands will accept a swede :) )
@hbiyik , I should warn you about the biggest problem with Channels stuff so far: reactor hogging. To circumvent the problem, I had to come up with a kind of a congestion control mechanism. It is not enough to just make the code fast. You need to make it asynchronous as well. For 7.6 I planned to create a separate asynchronous priority queue to handle DB requests. Nonetheless, you are welcome to come up with a better solution!
Update: 15 Sep we hope to have next version of channels ready in RC1. Key focus is faster performance, while not blocking reactor and stability (touches lots of low-level code, lots of optimisation directions to explore, complex).
Is this new concept based on ATTACH statement of sqlite, the optimization i am working on is somehow becoming something else since i am looking a way to make the overall process as an async task, i think attach is the most clean and orm friendly way if achieving it.
@hbiyik , @synctext was talking about a Tribler RC combining different planned optimisations for Channels, not about some single "ultimate optimisation to rule them all". I will add a table with proposed/planned optimisations later today in #3615
hello again after a short silence. @ichorid, @qstokkink
https://github.com/hbiyik/tribler/commit/47ac1b8a895760a79b4998d2849ad8b9e97bd225
You can find proof of concept work in the above commit. Please note that it is not a working example at all but an experiment to see the boundaries in terms of performance. I chose consolidate_channel_torrent method to work with to trigger cpu/mem heavy operations.
This is not complete because i am not using orm entities, and yes i have tried to make it work with pony but, i could not find a way to implement my design choices, if there is a way to do it in pony as well, that is also ok.
Ideas under test: 1) Outsource all recursive and costly elements of data-set to sqlite, including sorting, timestamping, recursive selection etc, dont do such stuff in python. Rationale is, no pure python implentation can be faster that sqlite in terms of data operations.
**Result:** SQL recursive selection is quite accurate and fast, may be 100x or more however, sqlite operations are blocking, but i think thread utilization should be solution to overcome this.
Test data is around 300K recursive node to commit: selection time, including timestamp correction, and recursive member counting, and sorting around : 20s.
2) Use iterators and generators, to leverage using as small as ram possible. Also this method snychronizes both, data traversal, signing, compression, and writing the mdblob. Since it is synchronized, it can easily be asynchronized in the reactor with the correct task prios.
**Result:**
TLDR: Above commit shows a poc, where it is possible to dump 300K recursive nodes to 1 single 65MB compressed and signed mdblob in 1.5 minute where %70 of this process can be quite asynch
Impressive job, @hbiyik ! We can almost put it into our codebase as it is. However, your implementation lacks two important details:
sign
on a serialized entry, there must be a command to update the corresponding DB entries. Is that what you meant in your words about "Further update of the database"?timestamp
field should not be updated from the system time. I agree the name is misleading. Instead, timestamps must use an increasing counter. In the current implementation the timestamp is only initialized from the clock, as an optimization to avoid maintaining a counter in the DB. Using the system clock instead of an integer counter leads to the risk of creating multiple entries with the same timestamp. The Channels system expects timestamps to be unique and increasing (for a given public key).EDIT:
Further update of the database is not implemented because this evolves to complete re-writing of the class.
You don't have to rewrite the class (if you mean the ORM class) at all. You just need to be sure the SQL <-> Pony objects mapping/cache is flushed before running your direct SQL stuff. Then Pony can proceed to work as normal.
Also, to test the correctness of your code you can try replacing consolidate_channel_torrent
with your implementation and run the following two tests:
https://github.com/Tribler/tribler/blob/43a4cb070a86ba3c7be0e903a1398e1179f6cbb4/src/tribler-core/tribler_core/modules/metadata_store/tests/test_channel_metadata.py#L374
These two tests thoroughly check all possible situations and content types that can be put into a channel. To visually control the results you can use the pprint_tree
method: https://github.com/Tribler/tribler/blob/43a4cb070a86ba3c7be0e903a1398e1179f6cbb4/src/tribler-core/tribler_core/modules/metadata_store/tests/test_channel_metadata.py#L491
You don't have to rewrite the class (if you mean the ORM class) at all. You just need to be sure the SQL <-> Pony objects mapping/cache is flushed before running your direct SQL stuff. Then Pony can proceed to work as normal.
So the only way to do is, instantiate the entity from_dict, with the row, directly with the result of cursor.fetchone(). Because under the hood pony always uses fetchall() or fetchmany(). This will introduce a little bit overhead (cpu+mem) and hopefully should be ok, or not gotta test it. but if i understand you on flushing the orm cache correctly, on each iteration we need run ORM cache flush and may be a manual GC collection after we are done with the entity instance. But the concern i have is GC in Cpython only works where there is no further reference to this particular object. If the instance is somehow referenced in somewhere in pony (through some saching mechanism) the RAM will still keep stacking on itself.
Is that what you meant in your words about "Further update of the database"?
Yes, that part is rather easy i think and the method here is almost similar with the seelction but a simpler one, creating a pivot table on the sqlite memory space, and updating this pivot table on the actual table. Thus all dirty stuff to be handled on sqlite part again. I expect this would not require more resource than selection.
timestamp field should not be updated from the system time. I agree the name is misleading. Instead, timestamps must use
Yes, that part i think is a misconception on my end, i need check in detail how this timestamping works in both writer and reader directions, and also the tests you mentioned would help a lot.
So the only way to do is, instantiate the entity from_dict, with the row, directly with the result of cursor.fetchone
No, what I meant is you should not rely on Pony to fetch stuff at all, but instead access SQLite directly as you do. It should work like this (in pseudocode):
get the public key and id of the channel through ORM
finish the ORM session
invalidate ORM cache/mapping
do your SQL stuff directly with cursor as a single transaction
continue to work with ORM as normal
You don't need to flush the ORM cache on every iteration since you do not use Pony object in your code. Instead, you must ensure that all ORM-induced changes are flushed to DB before you start manipulating it directly.
Hi @hbiyik! I'm the author of PonyORM. @ichorid asked me to look at the code that you suggested regarding its compatibility with PonyORM.
I don't see any problems regarding interaction with PonyORM. It is a good practice to rewrite heavy queries in raw SQL to avoid ORM overhead.
Some small comments:
consolidate_channel_torrent
or consolidate_channel_torrent2
with @db_session
decorator, as these methods are called inside another db_sessions and nested db_sessions are ignored by ORM.flush()
before executing raw SQL queries. When you call db._exec_sql(...)
PonyORM flushes all previous unsaved changes automatically.db.execute(...)
method instead of db._exec_sql()
to issue raw SQL queries. db.execute
is a part of public API, while db._exec_sql()
is a part of implementation details (db.execute
calls db._exec_sql
internally). Note that db.execute
automatically creates query parameters for embedded Python expressions like $var_name
or $(some expression)
so the result is safe from SQL injections.db._exec_sql(...)
as you do. The reason is, db.execute()
starts a transaction and this locks SQLite database. We probably don't want to lock the database during this long query execution and subsequent file creation. If concurrent transactions will never update rows involved in channel commit process this should be safe.So, the correct integration with PonyORM the code should be like that:
GigaChannelManager.updated_my_channel
currently does)To me, the current code looks correct regarding SQL <-> ORM interaction. It already does what I have described above.
@ichorid thanks for the tips, @kozlovsky thanks for your time to review and put the suggestions in a composed way, I think i have the overall way of how to do it without hurting ORM
It is necessary to use ORM objects only if they have some application-specific logic implemented in Python which we don't want to duplicate in raw SQL updates. In current case of channel commit implementation there is no such logic it seems.
Only in above statement, serializer/deserialization is a member of entity classes, however i think reimplementing this as an alternate way with the row tuples/dicts is the only solution, that should not be a very difficult task, but need to test it throughly against weird pitfalls
ie: in poc commit there is no delete serialization because it is just a poc. and also it is based on tuples, dict like object is a cleaner option and sqlite.Row factory to be defined in the connection statement since this is a connection configuration in _sqlite.
https://github.com/python/cpython/blob/b146568dfcbcd7409c724f8917e4f77433dd56e4/Modules/_sqlite/cursor.c#L122 https://docs.python.org/3/library/sqlite3.html#sqlite3.Row
@ichorid
Since working with direct rows is the direction, I designed a MDBlob class, that serializes, sings, compresses and writes the mdblobs from rows. and also reads, deserializes, and generates rows back from the mblobs. Benefits:
1) there is no congestions or async issues, all class is async 2) there is no ram consumption, litereally all class consumes few bytes, reader and writer is implmeneted in a walker like design. 3) quite fast: 1 Million record writer in about 2 minutes, 1 million record reader under 1 minute. 4) should work also with entities from_dict & to_dict methods as well (not tested though)
CONS: 1) It uses an alternative implementation of serialization rather than what is available in serialization module of metadata_store package, this needs to be DRYed, but i think, this serialization module also needs some refactoring anyhow to optimize the memory consumption since all entities creates a new instance of payload classes.
You can check the implementation in
https://github.com/hbiyik/tribler/commit/67d93f303b954d6c6e0344646130518038d6d20b
And run some tests at:
Here are some results on my end numbers depend on the host cpu and your disk type.
My case CPU: 8core AMD FX8320 OCed to 4.2Ghz, RAM: 12GB of DDR3 ram (not really relevant) DISK: Disk1: 5400RPM, Disk2: 7200RPM, Disk3: Samsung EVO 970 NVME SSD drive
First of all, bravo for the performance :clap:
Now, some food for thought:
aiofile
: "Async file access is always slower than sync one (because of the thread pool overhead)". Except for Linux, where aiofile
uses native Linux filesystem AIO interface.
What you did is, you optimized the file part of mdblob processing for memory usage (with generators). However, the biggest bottleneck is not the file reading performance, but entry insertion into the main DB. The whole process looks like this:
"file stuff": [read file bytes into memory] -> [decompress bytes] -> [check signature (optional?)] -> [parse bytes (deserialize)] ->
"db stuff": [check DB invariants (e.g. duplicates and/or older entries)] -> [insert entry into the main DB]
You optimized the "file stuff"
part. The biggest bottleneck is in the "db stuff"
, namely the checks and DB WAL, indexes creation, supporting entry creation (like torrent checker table), etc.
Actually, there is a much simpler and faster way to solve the problem you tried to solve in your latest performance test - instead of serialized entries blobs, use SQLite DBs as a storage format. That will allow us to skip Python completely (except for signature checks - but that's a different story). As I have stated above, using mdblobs for storage was mostly a way to simplify development by unifying the routines and format for torrent-based storage and network transmission.
Anyways, it would be great if, using your new routines, you will measure processing/committing some big channel that is already in the system.
Actually, there is a much simpler and faster way to solve the problem you tried to solve in your latest performance test - instead of serialized entries blobs, use SQLite DBs as a storage format.
That mos.def the best way to do this, but also requires redesign of the database, maybe multiple tables, and even maybe multiple databases and almost rewriting of the channels. So i would be watching closely this approach as well. But again my target here is to make things "good" not "perfect". And generally "perfect" is the best enemy of the "good", which is quite "bad".
Let me elaborate the discussion little bit.
aiofile: "Async file access is always slower than sync one (because of the thread pool overhead)". Except for Linux, where aiofile uses native Linux filesystem AIO interface.
The bottleneck when interfacing mdblob is on CPU side, so asynching IO call to the disk would merit but not hat much.
The idea here is slicing the overall routine (with routine i mean DB -> somestuff -> FILE orr vice versa as a whole) to sub coroutines, so the long operation time would not block the reactor, and could be cancelled on demand. and also balacing the resources in the meanwhile (CPU - RAM - IO) On DB stuff i am using generators (with recursive selection) which are semicoroutines and their scopes can be switched by their nature, on the file stuff this class uses directly coroutines, so in each row iteration, the reactor can switch another coro or cancel the task.
Now the next and final step is the updating DB for this I have several ideas and should be fine as long as I can access the correct locks/semaphores in the Pony, lets see how that will work, i am expecting similar performance with recursive selection.
Anyways, it would be great if, using your new routines, you will measure processing/committing some big channel that is already in the system.
If i had such a thing i would put a PR already :)
Generators in Python always run on the same thread where they were called. They do not use threading implicitly. Ergo, stuff in a generator will still block the reactor. Essentially, generators are just syntactic sugar to make up for Python's lack of pointers and goto
command. To make your generators stuff truly async, you have to use the asynchronous generator protocol.
Even more important, the reactor does not know when it can switch to another coroutine. The programmer must tell it when to do that - by using the await
keyword. However, it only makes sense to await
on async
methods. Yep, that means that the thing should be async
to the bottom down! Well, you may ask then, how does the chain of async calls terminate? At the top, it terminates by wrapping it in an asyncio.Task
. And at the bottom, it terminates either by a call to a library function that adheres to the async
protocol (e.g. aiofile.Reader
) or by wrapping your (synchronous) code into a thread by using loop.run_in_executor(your_sync_callable)
.
Task -> await-async -> run_in_executor
Correct me if I'm wrong, but I see neither run_in_executor
calls in your generator code, nor bottom await
s for calls to external library functions.
There is a simple way to test if the reactor is hogged: create a Task
at the top level of your app that will loop endlessly with asyncio.sleep
and print something to the console. That is your "heartbeat". If the heartbeat is steady during the performance tests, the reactor is free enough.
Correct me if I'm wrong, but I see neither run_in_executor calls in your generator code, nor bottom awaits for calls to external library functions.
Yep, i wasnt very clear on that, I shouldnt write messages in the midnight. The reason is that i did not make them async yet, my point in generators is that in terms of syncronous programming generators are the closest thing to coroutines in asynch, because they are kind of semi-coros running in sync.
The reason i did not do them yet is db execution and cursor movement, those calls should run at the end of the day in a thread (at least the selection part db.execute), because i dont want to block the spinning reactor with those costly db calls. (around 20/30 sec). But the real problem here is
Python _sqlite module is not thread safe at all (even though c-level sqlite3 is). possible solutions:
Since I am fetchone method, in either case using (method1: use mutex to serailze the access, method2: queue the sql statements and run sequentially), the cursor will be blocked because sqlite to my knowledge has no concept of cell/row/column/table locks https://sqlite.org/threadsafe.html, even if there is, there is no python library exposes those. (may be option 2 is but i am not sure. option 3 is somehow the same concept as above but in sqlite library itself.)
So bottom line, when cursor is moved accross the db, or a costly statement is executed in sqlite, rest of the the accesses must be blocked from the the app, because the rationale here is again, the costly operations we are doing in the sqlite itself can never be done faster in python, also it introduces bunch memory issues. What I am targetting is leave the reactor free at that point of time so it can handle rest of the tasks related to ipv8 or things not related to DB in general.
At the end of the day I target a design as below.
async def commitnodes():
with Pony.getcon() as con:
iterator = await costlydbselect(con)
upd_cur = con.cursor()
for row in iterator:
await MDBlob.write(row)
upd_cur.execute("sql statement to update stuff")
And currently i am focusing on slicing the iteration to minimum and most affective points, thus currently it is on snyc with generatos, when i deal with SQLite issues, i can easily convert them to coros and taks.
by the way, above is just a pseudo code, with context managers those stuff can be made prettier without exposing cursors and connections to all around.
Also in the case with the new design with multiple DBs and attach methods, there will still be an option receive old mdblobs from the older DBs right, so if that works it will be usefull for future as well, regardless of whatever the next redesign is.
to be practical:
something like that https://github.com/hbiyik/tribler/commit/ea449281e1ddbc55e9c8d0c18abe6f31085ba4e0, + transaction control on updating
Well, the DBExecutor
class you provided should work nicely for direct interactions with the database. Basically, you combined ThreadPoolExecutor
with sqlite3.row
to create a thing that produces sqlite rows from a background thread asynchronously. However, I don't see how we can use it in its current form, aside from adding some syntactic sugar to your recursive SQL channel commit procedure or running some performance tests. This is a step in the right direction though. Last year I was thinking about subclassing the executor myself, until I realized that we will still have transactions blocking each other. Then, we discussed it with @kozlovsky and he formulated the only viable architecture:
an asynchronous "fetcher" for Pony objects that should internally run two queues:
It should work similarly as the thing you developed: you put the transaction in, you get Pony objects out, asynchronously. If we want to be able to do direct SQL transactions, we want them to use the same two queues. Otherwise, there will be deadlocks.
One of the reasons why I am not doing the thing at the moment is that I wanted to wait for @kozlovsky to get here in September and do it together. Nonetheless, your work on recursive commits is very appreciated because it does not touch this topic directly and can be later integrated easily in the queue-based framework.
quite fast: 1 Million record writer in about 2 minutes, 1 million record reader under 1 minute.
@hbiyik impressive work! It has been a month since there was activity in this thread. @kozlovsky also joined the team now. Can this work be turned into production code which does not block the reactor? Please get "Good Code" into production before re-doing everything from the ground up to make it perfect. The current channel implementation is too broken for users to use. It's not about insert time, downloads should start instantaneous.
@synctext I was in vacation for the last month so couldnt really put focus on this one because of sand, summer sun, and beaches. :)
However from my POV & understanding i can provide executive summary as below:
Best way to manage this is to use multiple dbs/tables and plus attach dbs i think this is the consensus.
Nevertheless old db design will still be usable in the wild, then this re design would bring benefit to commit or merge legacy mdblobs from/to the torrent transport from/to sqlite db.
To answer the question precisely, the async part discussed here, better to be implemented in the pony with the proper queue or locking methodolgy. An iterative way of traversing cursor to be also provided by pony as well to prevent memory issues, i think this part is as vital as ever. So the snippets here are not close to PR quality, due to the fact it somehow introduces redesign of several components ( even pony itself). But the recursive sql code is quite solid in production level, even though this sql snippet is just the tip of iceberg.
There are some great insights in this issue, like deep understanding of IPv8 perfection:
i am amazed that serialization performance from ipv8 is quite near perfect in terms python performance
Even though I would like the IPv8 appreciation going, I believe this issue is now mostly out-of-date and should be closed:
Hello
I have added around 300/400k torrents to my channel, and when I issues commit to the channel, The commit take too long time, and after some time, i received a crash. Too bad that i did not get any logs of it.
Now when I try to reopen tribler again, it tries to regenerate the channel form metadata db, but the process takes literally days to just commit. Funny enough, when the restoration is succefull, and you are patient enough, the core is stuck for some reason, but can close gracefully. After a close/open of tribler, you start retoration again, and it takes a lot of time again.
You can find the metadata.db in the below link. Drop it to you tribler dir, and you will see the problem.
https://ufile.io/7gqv5ump
Tested with a AMD 8320 CPU and 12gb of ram with NVME disk, so the setup is quite fast.