Question: how can one figure out why the particular binary log is much larger than it is supposed to be?

MarkKharitonov commented 2 years ago

I have a project for which the binary log should be about 22M. However, on some of our build servers the file is almost 20 times larger. Something is terribly wrong. Knowing that binlog is just a zip file I peeked inside. In the past one could find a collection of files there, but today it is a single file in a proprietary format. In this case the file is over 3G in size! My immediate desire is extract all the files captured by the log in hope of finding an outlier. But I do not know how to do it, except for reverse engineering the binary log viewer from the source. Which I can, but before doing so I would like to make sure there are no better ways. Also, what if this way reveals nothing? In short I am looking for an advice on how to troubleshoot this situation.

KirillOsenkov commented 2 years ago

Do File -> Statistics on both binlogs and compare the versions, number of events, number of strings.

I’m guessing the larger binlog was built with an older version of MSBuild, and will have an older FileFormatVersion.

KirillOsenkov commented 2 years ago

Always try to build with the latest MSBuild for best results. The latest FileFormatVersion is 14.

MarkKharitonov commented 2 years ago

Hmm, I clicked the Statistics option, the mouse cursor went busy but then nothing happened. Repeating does not yield any result... The viewer version is 2.1.627

I tried on another, much smaller log and the result is the same - nothing happens.

KirillOsenkov commented 2 years ago

After Statistics is done (it may take some time), scroll the tree to the end, there will be a new Statistics node

MarkKharitonov commented 2 years ago

Got it. It is not obvious. At least to me. Thanks. Will check.

MarkKharitonov commented 2 years ago

I would like to propose a feature - the ability to jump to the actual target invocation. Indeed, as we know the same target within the same build is executed only once, all the rest are either immediately skipped or wait for the completion of the one instance that is running and then skipped. For example, consider the GetNativeManifest target. When searching for $target GetNativeManifest project(SomeProject.csproj) one can get tens of hits. I guess this depends on the number of other projects referencing the SomeProject: Now I need to locate the one node which represents the actual work. I do it by clicking each node and checking in the right pane whether the node contains any children. This is 2 clicks and scroll up (because there is no auto scroll). Maybe there is an easier way, I do not know, but this way is excruciatingly slow and painful.

It could be improve in several possible ways. For example, there could be a sync button. When pressed it instructs the right pane to automatically select the node matching the selected node in the left pane - so far it already happens. But in addition to selecting the node, it would expand it (one level) and auto scroll up, so that the children are visible. This way we can just move from node to node on the left pane and the right pane would be automatically updated to show the necessary context. Would be very helpful.

Another option is to have a button or menu option to jump to the node with the actual work. That would be fantastic.

MarkKharitonov commented 2 years ago

Another useful feature would be to show all the fields from the respective XyzStartedEventArgs and XyzFinishedEventArgs. Could be an option in the context menu when open on a Project/Target/Task node.

KirillOsenkov commented 2 years ago

We can't show all the fields from the event args because that would explode the memory consumption (we'd need to hold on to all event args). Right now we're throwing them away. The viewer is very tightly optimized to reduce memory consumption so it can open huge binlogs. While I think this would be neat in theory, it'll be hard to implement in such a way that doesn't cause any more increase of memory consumption.

Good point about finding the "real" target - I'll think about it.

MarkKharitonov commented 2 years ago

May I suggest an idea? Do not hold onto the real events. Rather open the binlog file again and locate the respective events from scratch. It could be done quite efficiently if the viewer just remembers the index of the respective events. Of course, it is possible that the original binlog has been deleted. Fine, in this case the command would say "sorry, no can do". It would be known that running this command has the aforementioned disclaimers, but I think it is worth it.

KirillOsenkov commented 2 years ago

OK, maybe one day.

MarkKharitonov commented 2 years ago

I would like to return back to my question. Here is the Statistics result in the binlog from the server: And here it is from a full local build: I can also say that the local build uses msbuild 16.11.0 while the server uses msbuild 16.8.3

Can this version difference explain what is going on? In other words are there known issues that were fixed between 16.8.3 and 16.11.0 that are relevant to my problem?

KirillOsenkov commented 2 years ago

yes absolutely. There were huge and very significant improvements to the binary logger data compression, performance etc. between the versions 9 and 14. Version 13 is almost current, but version 14 is even better.

The difference you're seeing is fully explained by the versions of MSBuild and the binlog file format.

I tweeted about this here: https://twitter.com/KirillOsenkov/status/1395510072348868609

MarkKharitonov commented 2 years ago

I guess version 14 corresponds to msbuild 17. Anyway, I understand we must update the build tools on our servers. So far it is a PITA, because installation is interactive. Will have to research into unattended install. Thank you for all the help.

KirillOsenkov / MSBuildStructuredLog

Question: how can one figure out why the particular binary log is much larger than it is supposed to be? #565