Closed textshell closed 2 years ago
bytes send to remote bytes read from remote
I actually wrote that a few days ago. #2210
=> Separate borg.output. logger + cmdline option for these?
Magic ;)
I was thinking like this would be an new option similar to --summary but with performance focused information. Still more like a report than just some logging lines. Just using a logger might be easier though. I expect some of the data collection to potentially have a performance impact though. (or maybe not? e.g. querying the current time on linux is just a memory read)
Direct tracing (via sys.setprofile
) is really quite expensive, adding ~two extra function calls to encrypt+decrypt+id_hash has maybe not such a big impact, but perhaps it would be better to implement it directly (not via @decorator magic or extra methods).
I didn‘t consider sys.setprofile
a valid path for lightweight stats. I agree that such performance code should be kept lean and not too magic. But using an extra method would ease implementation:
Something like this would be fairly maintainable and easy to check for correct usage.
performance_stats.set_section("chunking")
...
performance_stats.set_section("id_hash")
More synthetic tests are covered by #1788 i think.
RemoteRepository: 139 bytes sent, 46427 bytes received, 5 messages sent
Resource Usage (relative): 5% CPU, 0% input, 94% network
Resource Usage (absolute): 0.18 CPU, 0.0 input, 3.2 network, 3.5 wall clock
Inspired by a screenshot I saw of Veeam that tells you similar numbers and even has a "Bottleneck" item in its summary.
(However, the "network" part includes the startup time of the remote server -- so me entering a SSH password as well --, but that's easy to remove)
The "input" isn't populated yet. For the remote code it's very simple and just a few spots, but "input" is something done at very many places, so I suspect even adding ~two function calls per place might impact performance. Perhaps do it rather coarsely grained, via the existing backup_io blocks.
Resource Usage (relative): 33% CPU, 68% input, 9% repository, 0% cache sync
Resource Usage (absolute): 20.87s CPU, 42.24s input, 5.92s repository, 62.37s wall clock, 0.00s cache sync
Resource Usage: bottleneck: input
(Actual "borg create"... CPU is inclusive, the other's aren't -- it's from getrusage. So cache sync runs -> 99 % cache sync but also 99 % CPU)
Resource Usage (relative): 34% CPU, 53% input, 32% repository, 0% cache sync
Resource Usage (absolute): 6775.28s CPU, 10520.41s input, 6427.79s repository, 19934.58s wall clock, 0.00s cache sync
Resource Usage: bottleneck: input
Rename "Resource Usage" to "Time used in" or something like that? Currently i think this sounds like some kind of magic.
Yeah, it's just the class name for now (diff, messy, so no PR now). Naming is hard :)
@ThomasWaldmann I would like to give it a shot, can you kindly assign it to me?
@fakedreamz you're assigned (you need to "start working on this" on bountysource yourself).
After you had a look at the relevant code, maybe update the ticket with your plans.
@ThomasWaldmann thanks, will do. Is there any other documentation apart from the docs
folder to understand the high-level design?
There might be some more insights distributed over the issues here and on the mailing list, but the docs, esp. internals docs are maybe the best read.
If you search for a starting point: they are in src/borg/archiver.py
, e.g. do_create
implements borg create
.
@ThomasWaldmann would like to confirm my findings with you.
1 The process_file
function has all the status for a file that is being processed.
U
(unchanged)M
(modified), not sure if we need to include also the C
2 The chunk_processor
function can be instrumented for the following -
chunk_iter
is the iterator that reads the file and thus we can add instrumentation here to gather statistics.id_hash
function calculates the hash of a chunk, so we can add instrumentation before and after to gather statistics.3 For the following, I still haven't explored them (but maybe they are in RemoteRepository)
4 No comment yet on
5 I haven't looked into process_dir/pipe/dev/dir_with_fd/fifo/symlink
in FilesystemObjectProcessors
. Should they be also considered for statistics?
Let me know your thoughts/comments on the findings. TIA
1 yes, but maybe increase the respective counter at the same place where these status chars are set rather than deriving that from the status char.
2 sounds right
3/4 remote stuff is in RemoteRepository, yes. crypto and compression is in one nested call, iirc (and not in remote.py).
5 maybe stats about file types are not too interesting. usually most stuff will be regular files and dirs. and if people are backing up block devices, they know it already.
In general it would be good to add as little code as possible for this (some code parts are already rather complex and we should try to make readability not even worse). We already have stats
- can we add some attributes in there?
Please also check the comments in this ticket, some work has already been done / some ideas were evaluated.
For a start, maybe just do the stuff that uses simple counters.
Hello. I've been thinking about how to implement this so I forked the repository and pushed three short commits.
In the first one, I added a dictionary to fso.stats
and added a line to the FileSystemObjectProcessor.process_file
to update the dictionary once a file has been processed. This dictionary has keys corresponding to the files status A, U and M; and it's values are counters.
For the chunking, I added a variable to FileSystemObjectProcessor.stats
called chunking_time
to accumulate the time spent in chunking.
Finally, I measure the time spent in hashing calling the time.time
method before and after calling id_hash
method.
At this moment, the printing of these new variables is done at the end of the Archive.create_inner
method, but I'm thinking about programming the Statistics.__add__
to make it append these new variables too.
Now, I am working on calculating the bytes sent and read from remote. For this, I think it is enough to use the attributes tx_bytes
and rx_bytes
in RemoteRepository
@francoo98 sounds good, but code review is easier if you just open a pull request against master branch.
@francoo98 sounds good, but code review is easier if you just open a pull request against master branch.
Ok, I opened a PR. Can I get this assigned?
Fixed by #6991. Thanks to @francoo98!
@francoo98 looks like bountysource has technical troubles (again). :-(
I clicked on "accept", but the site does not register my acceptance.
But there is a timed mechanism also, so if noone rejects, it will also pay out the bounty to you. Alternatively, we could contact bountysource support if you do not want to wait for that.
I'd like to contact support, it will take 2 weeks for the website to pay me otherwise
@francoo98 ok, then just write an email to their support and point them to the bounty and to this github page.
maybe they could finally fix that one can't accept bounties, that happened before.
@francoo98 the chunking_time test sometimes fails on github CI.
@hexagonrecursion can you try if you can accept, so @francoo98 gets this bounty?
I can't find the failing workflow. Can you link it?
It was the chunking_time > 0.0
test.
I understand, but I don't see it failing anywhere. I checked github actions and everything seems fine, and my pull request also says everything is fine.
Well, I've seen it, so please fix.
@francoo98 bounty "paid out", thanks to @bountysource support! \o/
@bountysource btw, i found out why i could not accept the bounty:
So it seems that your site has compatibility issues with safari.
Just a thought from an irc discussion about finding backup performance problems. If we had some option that outputs more detailed statistics about potentially performance relevant details it might be a lot easier to diagnose performance problems.
Ideas:
:moneybag: there is a bounty for this