Verbose GC, missing contextid in concurrent-collection-start

amicic commented 3 years ago

<concurrent-kickoff id="795" timestamp="2020-10-19T07:18:51.326">
  <kickoff reason="threshold reached" targetBytes="717044792" thresholdFreeBytes="99074768"  remainingFree="98966736" tenureFreeBytes="175532784" nurseryFreeBytes="98966736" />
</concurrent-kickoff>
<exclusive-start id="796" timestamp="2020-10-19T07:18:51.334" intervalms="432.374">
  <response-info timems="0.252" idlems="0.249" threads="1" lastid="0000000000785500" lastname="Group1.TxInjector.CompositeTxInjector.I.driver.service.35" />
</exclusive-start>
<cycle-start id="797" type="global" contextid="0" timestamp="2020-10-19T07:18:51.334" intervalms="48768.123" />
<exclusive-end id="798" timestamp="2020-10-19T07:18:51.334" durationms="0.177" />

....

<concurrent-collection-start id="800" timestamp="2020-10-19T07:18:51.676" intervalms="55891.125" >
....
<cycle-end id="810" type="global" contextid="797" timestamp="2020-10-19T07:18:51.766" />
<concurrent-collection-end id="811" timestamp="2020-10-19T07:18:51.766" />

While concurrent-collection-start/end are attached to cycle-end that does have contextid, for completeness it's probably wise to add it to concurrent-collection-start, as well.

amicic commented 3 years ago

On a more general note, I have a feeling that concurrent-collection-start/end is overly complicated.

concurrent-collection-end is simply redundant and brings no value.

concurrent-collection-start (which actually is a 3 liner - see below) could probably be collapsed to one line. we don't have anything similar to report in GMP (in Balanced) and Concurrent Scavenger final increments.

Additionally 'start' from concurrent-collection-start is somewhat confusing. Someone might think of it as a concurrent cycle start, but in fact it's just the start of the final STW increment of a concurrent cycle. We already have a couple of other indicators it's a final STW (exclusive-start and gc-start) reported nearby.

<concurrent-collection-start id="800" timestamp="2020-10-19T07:18:51.676" intervalms="55891.125" >
  <concurrent-trace-info reason="card cleaning threshold reached" tracedByMutators="656526016" tracedByHelpers="45082100" cardsCleaned="32133" workStackOverflowCount="0" />
</concurrent-collection-start>

MelHopper commented 3 years ago

Please can you add an ID label, and state that the following topic may need an update to the log snippet(s):

vgclog.md

Many Thanks

RSalman commented 3 years ago

I’m in favour of removing concurrent-collection-end as it doesn’t give any useful info. Exclusive is released right after reporting this, the exclusive stanza is sufficient for knowing the order of events. I also think concurrent-collection-start can be collapsed, the only concern here is the stanza can get fairly long with all the attributes on the same line, especially with “reason” string

RSalman commented 3 years ago

I also think element name concurrent-collection-start is misleading, it hints at "concurrent" activity (with mutators), whereas it's pure STW

amicic commented 3 years ago

Good point about the long name. we could leave it as a multi-liner.

As for the rename of concurrent-collection-start, I'm struggling a bit, but how about concurrent-final, or better concurrent-mark-final or concurrent-global-final (since we may have concurrent scavenge as well)?

RSalman commented 3 years ago

I think the name should have global in it, so I like conurrent-global-final. It shows a relationship to cycle-end that follows it:

<concurrent-global-final id="800" contextid="797" timestamp="2020-10-19T07:18:51.676" intervalms="55891.125" >
....
<cycle-end id="810" type="global" contextid="797" timestamp="2020-10-19T07:18:51.766" />

RSalman commented 3 years ago

We don't mention "mark" in any global concurrent stanzas (e.g concurrent-kickoff), so I'm not for concurrent-mark-final. I think it's best to leave it out.

MelHopper commented 3 years ago

Hi both, adding in from a user experience perspective...(@RSalman FYI I'm currently writing the OpenJ9 topic on verbose GC logs)

Are we able to use tag names for concurrent STW events that mean the user can immediately know what STW event is being referred to, without having to trawl through documentation? e.g. It is not obvious to a user that, when looking at the logs, the information being reported are STW events that exist within a concurrent cycle / a certain amount of background knowledge is assumed.

From what I understand, the way a user would analyse the logs for STW events in a concurrent cycle, would be to search for the word 'concurrent', so they can differentiate STW events of a STW cycle easily from the STW events of a concurrent cycle.

For the rest of the tag name, the actual tag related to the concurrent STW event in the concurrent cycle changes from policy to policy. The question is, what does the user want to know when she searches for 'concurrent', and how does this depend on the policy type?

For gencon, user needs to know whether this is the first STW event, or the final collection. What else?
For balanced, user needs to know whether this is a tag that simply records an intermediate snapshot of the heap, or whether the tag is recording a concurrent STW event.
For other policies, what does the user need to know? What concurrent events are logged?

As I see it, there are 2 things we can get across in the tag name to improve UX and to also ensure consistency across different tags and different policies: i) Include 'STW' explicitly in the name, so that user knows immediately what part of the concurrent cycle they are looking at (also vs. any other tags that do NOT label a STW event).
ii) include a word that links the STW event to the concurrent collection type. Which part of the collection definition is most important? that it is global or local, or that it is a mark & sweep etc.?

For this tag modification, given the above, is there a problem with a tag being this long:

<concurrent-global-final-STW-collection>

Thoughts?

After researching for writing the vgc log topic, I would like to discuss the user experience of the logs, to determine if we could improve this experience by renaming the concurrent tags, which would also lead to less and clearer documentation. I would like to do a bit of back-of-the-envelope user experience testing my end to get this right. Perhaps I can arrange a call with you both in the next couple of weeks? @SueChaplain @pmhayward FYI

MelHopper commented 3 years ago

The other question I had which users could find confusing: why is it that the <exclusive-start> tag sometimes precedes the <concurrent...> tag, and other times follows the <concurrent...> tag? Is there a reason for this?

RSalman commented 3 years ago

Are we able to use tag names for concurrent STW events that mean the user can immediately know what STW event is being referred to, without having to trawl through documentation? e.g. It is not obvious to a user that, when looking at the logs, the information being reported are STW events that exist within a concurrent cycle / a certain amount of background knowledge is assumed.

This is true, a certain amount of background knowledge is assumed. Anything between tags is a STW event, it's assumed that the user knows this.

From what I understand, the way a user would analyse the logs for STW events in a concurrent cycle, would be to search for the word 'concurrent', so they can differentiate STW events of a STW cycle easily from the STW events of a concurrent cycle.

I would say that it's not a straight forward search for "concurrent", it depends on what the user is looking for. A concurrent tag can show up in exclusive/STW (e.g concurrent-collection-start)

Just some background info to clarify To better understand the logs, we should look at cycles in terms of phases:

A cycle can be broken down into multiple phases.
- Each phase has different gc operations.
- Cycles can overlap (i.e phases of different cycles)

A cycle may have STW phases or Concurrent phases. For instance, with concurrent standard collectors/policies (gencon, optavgpause) global cycles are broken down into the following phases:

Initial STW
Concurrent
Final STW

A verbose log would roughly look something like this for this cycle:

Concurrent kickoff tag //Cycle is about to start, some initialization work is being done concurrently

Exclusive Start //STW 1
   Cycle Start 
Exclusive End

Exclusive Start //STW 2
   Concurrent Collection Start //This tag name is being changed here to concurrent-global-final
   Cycle End
Exclusive End

RSalman commented 3 years ago

For the rest of the tag name, the actual tag related to the concurrent STW event in the concurrent cycle changes from policy to policy. The question is, what does the user want to know when she searches for 'concurrent', and how does this depend on the policy type?

For gencon, user needs to know whether this is the first STW event, or the final collection. What else?

For balanced, user needs to know whether this is a tag that simply records an intermediate snapshot of the heap, or whether the tag is recording a concurrent STW event.

For other policies, what does the user need to know? What concurrent events are logged?

@amicic could answer this better. Perhaps a call would be better to go over this and some of the other points you've mentioned.

RSalman commented 3 years ago

The other question I had which users could find confusing: why is it that the tag sometimes precedes the tag, and other times follows the tag? Is there a reason for this?

Not sure if I understand the question, which tag are we referring to?

MelHopper commented 3 years ago

The other question I had which users could find confusing: why is it that the tag sometimes precedes the tag, and other times follows the tag? Is there a reason for this?

Not sure if I understand the question, which tag are we referring to?

My bad Salman, I was referring to the <exclusive-start> tag sometimes preceding a tag, and other times following a tag.

amicic commented 3 years ago

As I see it, there are 2 things we can get across in the tag name to improve UX and to also ensure consistency across different tags and different policies: i) Include 'STW' explicitly in the name, so that user knows immediately what part of the concurrent cycle they are looking at (also vs. any other tags that do NOT label a STW event). ii) include a word that links the STW event to the concurrent collection type. Which part of the collection definition is most important? that it is global or local, or that it is a mark & sweep etc.?

For this tag modification, given the above, is there a problem with a tag being this long:

Indeed, some inconstancy can be observed when reporting STW and concurrent phases of a cycle

we don't ever mention STW acronym and we do mention concurrent word or
concurrent phases may be reported differently in different type of GC policies or cycle types.

As Salman already said, STW event can be found from exclusive-start/end. And yes, we must assume some general GC background knowledge and reader getting familiar with our log specifics. So, I don't see much need to add that STW in more tags name. concurrent-global-final is very good compromiseto me. It's just saying why we triggered this STW phase: to end the ongoing cocurrent global cycle (as opposed to hiting allocation failure), but it's already clear it's a STW.

Generally, we have very little room from changing existing tag names, since we could break tools that parse the logs (such as Eclipse GCMV). concurrent-collection-start/end are be some of very rare ones

MelHopper commented 3 years ago

Hi @amicic @RSalman has this defect been put into an iteration plan yet? - I am wondering when the following changes will be actioned, so I can determine whether to keep the current log contents in the openJ9 docs for the next release, or to use the changed log content? - will the following be actioned by 15th Jan?

'' element name changed to ''
addition of contextid to ''
removal of ' element'.

Many thanks.

RSalman commented 3 years ago

Sorry @MelHopper I missed your question, changes went in earlier this month to address this issue. @amicic could you please close this.

eclipse-openj9 / openj9

Verbose GC, missing contextid in concurrent-collection-start #11237