Some questions about profiler API

dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.

https://docs.microsoft.com/dotnet/core/

MIT License

15.15k stars 4.71k forks source link

Some questions about profiler API #9685

Closed ww898 closed 3 years ago

ww898 commented 6 years ago

Our profiler excavates static variable values during the GC finished event and everything works perfectly till the GC where the first collectible assembly was silently unloaded. And we get crash after that. Now we disabled static variables for collectible assemblies. How to detect the moment during GC after that I can understand that the module was collected?
Our profiler gathers field names information in every object and assign it to incoming references. It's a bit complicated algorithm which depends on some current CLR undocumented behavior. The most important is: I expect that you process arrays from lower index to high one during GC. I did it because the alghoritm is O(x^2) and it's very very slow for large arrays.

The last one trouble is virtual references in objects from collectible assemblies. CLR add the virtual reference to System.Reflection.LoadAllocator object as first element of array of references in profiling API. It works for array types and most object types except following types and its inheritors (because there is the reflection in native for manages fields):

System.RuntimeType see m_keepalive
System.Reflection.RuntimeAssembly see m_syncRoot
System.Delegate see _methodBase

Could you please make our live more simple - extend profiler API to get ability to get additional information for references such as:

the array index
the virtual collectible reference flag
the real offset in object?

Add ability to resolve ModuileID+mdTypeRef to ModuleID+mdTypeDef.

lt72 commented 6 years ago

@noahfalk FYI

noahfalk commented 6 years ago

Hi @ww898 , very sorry I hadn't noticed this one earlier.

How to detect the moment during GC after that I can understand that the module was collected?

As an official .NET API, the solutions appear limited. You could determine that the address of the static was itself on the GC heap and then use the GC heap callbacks to detect when the containing object was collected. Although enumerating the GC heap can be fairly heavyweight it sounds like you may already be collecting this information. Unofficially looking into the runtime implementation details, the managed type LoaderAllocator has a m_slots array. I believe at least two of the objects in the m_slots array are object[]/byte[] that hold the static data. By specifically observing when the LoaderAllocator is collected you could infer it is no longer safe to access the static data. If you want to create a feature request to improve the capabilities of .Net Core lets file each request as its own item.

The most important is: I expect that you process arrays from lower index to high one during GC

Are you asking whether this is true, or requesting us to guarantee it remains true as part of the runtime contract with profilers, something else? I don't know this particular detail though @Maoni0 probably does.

CLR add the virtual reference to System.Reflection.LoadAllocator object as first element of array of references in profiling API

Are you referring to the ICorProfilerCallback::ObjectReferences API or something else?

Could you please make our live more simple - extend profiler API to get ability to get additional information for references such as ...

Have you looked at the layout related APIs such as: ICorProfilerInfo2::GetClassLayout ICorProfilerInfo2::GetArrayObjectInfo ICorProfilerInfo2::GetBoxClassLayout ICorProfilerInfo2::GetStringLayout

These APIs should already let you determine the offsets where individual fields are stored, or the array indices for array elements. If you don't feel these APIs are sufficient to accomplish your goals we should chat a little more about what you are trying to do and why these APIs didn't work for you.

Similar to your first question I don't think there is good support specifically for the reference to the LoaderAllocator used in collectible assemblies. By process of elimination you could determine that there is a reference reported from ObjectReferences() that doesn't exist in any of the defined slots in the layout and then infer the meaning of this extra runtime added field, but certainly its not an ideal API design.

Add ability to resolve ModuileID+mdTypeRef to ModuleID+mdTypeDef

We can create a feature request. Two questions that come to mind: a) Do you also care about mdTypeSpec, or mdTypeRef alone is sufficient? b) If the type you are trying to resolve has never been loaded by the runtime is it acceptable to return failure? The alternative is the runtime tries to load the type on-demand but this may cause lots of side-effects such as loading assemblies from disk or running more managed code.

To set expectations, creating the feature requests doesn't necessarily mean the runtime will start working on them, we still have to prioritize with many other requests that we get. However even if we don't build them, we can also offer advice for anyone in the community who is interested in working on it.

Maoni0 commented 6 years ago

The most important is: I expect that you process arrays from lower index to high one during GC

this is a very vague statement. if you could explain why you expect this and what kind of guarantee you are looking at it would be helpful.

ww898 commented 6 years ago

Hi @noahfalk, thank you for so full answers

As an official .NET API, the solutions appear limited. You could determine that the address of the static was itself on the GC heap and then use the GC heap callbacks to detect when the containing object was collected. Although enumerating the GC heap can be fairly heavyweight it sounds like you may already be collecting this information. Unofficially looking into the runtime implementation details, the managed type LoaderAllocator has a m_slots array. I believe at least two of the objects in the m_slots array are object[]/byte[] that hold the static data. By specifically observing when the LoaderAllocator is collected you could infer it is no longer safe to access the static data. If you want to create a feature request to improve the capabilities of .Net Core lets file each request as its own item.

Oh, great. The unofficial solution looks like a bit complicated but it should work. Here is the feature request to make the solution much simpler: https://github.com/dotnet/coreclr/issues/17228

Are you asking whether this is true, or requesting us to guarantee it remains true as part of the runtime contract with profilers, something else? I don't know this particular detail though @Maoni0 probably does.

Yes, I expect that you can officially grant that for nearest future. :-) On my opinion, the main problem is a bit common. We only have SetEventMask / GetEventMask now where a profiler can request required features, but I think, it isn't enough. I expect the list of supported features which current runtime supports. Something like the result of x86 cpuid instruction, but for profiling API. For example: enumerate_arrays_in_gc_from_low_to_high_indexes, executed_under_partly_trusted_context, module_id_in_etw_events_fixed or ...

Are you referring to the ICorProfilerCallback::ObjectReferences API or something else?

Yes, I'm using ICorProfilerCallback::ObjectReferences. The profiler should collect field values for every heap object and field names (including field names for nested structures). Could you please imagine following situation which our profiler should resolve:

struct A<T> { T t1; T t2; }
struct C<T> { A<T> a; object o; }

var arr0 = new C<object>[VeryLargeValue];
var arr1 = new C<int>[VeryLargeValue];

Let's the type A<> wasn't loaded during execution because ... some optimizations (only base type and generic argument types are granted to load). It means that the profiler doesn't know anything about t1 and t2. Is it a reference or not? What offset every field has? Which are field values?
On the other side the profiler should assign the array index for every reference and assign every reference to specific field in structure in situation when C<> can't be completely investigated.
The algorithm will be O(VeryLargeValue^2) except ... when I know the order for indexes. It last case it will be O(CountOfReferencesInStructure^2).

So, this is the reason to ask to extend profiling API: add indexes for arrays in ICorProfilerCallback::ObjectReferences or add ICorProfilerCallback::ArrayReferences for arrays.

ww898 commented 6 years ago

@Maoni0 Could you please look at the last part of my answer to @noahfalk? I tried to describe why I need array indexes in ICorProfilerCallback::ObjectReferences or grand the order of indexes...

noahfalk commented 6 years ago

Are you asking whether this is true, or requesting us to guarantee it remains true as part of the runtime contract with profilers, something else? I don't know this particular detail though @Maoni0 probably does.

Yes, I expect that you can officially grant that for nearest future. :-)

I think there was (and may still be?) some confusion about what property of the runtime you are asking us to guarantee. I originally misunderstood which lead to me pulling in Maoni. My current understanding is this is property you are hoping for: "the reference reported in the objectRefIds[x] of the ICorProfilerCallback::ObjectReferences call is the reference that is stored in array index X of the managed object." That's probably true when dealing with an array of a reference type, but I'm going to suggest there is a different and more complete invariant you could rely on that is already part of the official documentation of GetArrayObjectInfo: https://docs.microsoft.com/en-us/dotnet/framework/unmanaged-api/profiling/icorprofilerinfo2-getarrayobjectinfo-method This method gives you a ppData buffer, which in the case of an array of references will be an array of ObjectID. Unlike ObjectReferences, this function is also giving you the number of ranks, and the upper and lower bound of each rank, and then a guarantee that ppData is layed out in "C++ convention". The C++ convention could be better defined, but I think the real promise here is that the data is in row major layout. It also works just as well on valuetypes which ObjectReferences won't give you any information about. If you had some multi-dimensional array, example rank 2, indices spanning from 2-6 and 3-9 respectively, then could calculate that the reference at index [5, 7] is at ppData offset sizeof(ObjectID)*(elements_per_row*row + column) = sizeof(ObjectID)*((9-3)*(5-2) + (7-3)). If instead it was an array of System.UInt64 or struct Foo, you can swap sizeof(ObjectID) for sizeof(UInt64) or sizeof(Foo).

Does that seem like a workable scheme to match up references (or other values) to the array indicies which hold the data without needing to make guarantees about ObjectReferences() behavior?

ww898 commented 6 years ago

@noahfalk @Maoni0 I think we are speaking about a bit different things. I'm speaking about the order in outgoing references in ICorProfilerCallback::ObjectReferences. GetArrayObjectInfo tell me the size of each dimensions. But I can't use it in ObjectReferences because pointers to null objects aren't reported. For example: I can get only 1234 references from ObjectReferences for array with 1234567890 elements of structures with 12 reference fields inside. I cannot resolve this case fast and simple now. I need to scan whole array (surely I use GetArrayObjectInfo here) to gather all not null object references and store the array index and field name for them. When the order of references in ObjectReferences isn't granted I need to match two sets: from ObjectReferences and which the profiler gathered. The good question: why I need to match two sets? Why I can't use my set and ignore references from ObjectReferences? The only reason is the hard cases when the profiler cannot investigate 100% of array element fields because some types weren't loaded. When the order is granted I can optimize the match algorithm.

However, It would be great the separate method, like ObjectReferences, but with index array as additional parameter.

noahfalk commented 6 years ago

Why I can't use my set and ignore references from ObjectReferences? The only reason is the hard cases when the profiler cannot investigate 100% of array element fields because some types weren't loaded.

I'd suggest if we want to make improvements, lets make improvements that will solve these type loading problems. It sounds like there are potentially two issues that forcing you to use ObjectReferences: a) Some references (ie LoaderAllocator) have no corresponding field in metadata, and thus don't show up in the type field layouts at all. b) You are having difficulty from unloaded types - I'd be curious to get more info from you on this one, such as a repro case where it occurs and the particular profiler APIs you call to determine that the type isn't loaded. I can speculate about what you are seeing, but we'd want to be certain before embarking on any work.

When the order is granted I can optimize the match algorithm.

Although I was hoping to find a path that avoids the dependency, I think its safe to rely on at least for the forseeable future. There is a risk that we might add references you didn't expect to the beginning or end of the list (such as the LoaderAllocator for collectible types), but I can't see any reason why we would ever need to change the relative order of references from different array indicies.

However, It would be great the separate method, like ObjectReferences, but with index array as additional parameter.

My concern going down this route is that it doesn't appear array index would be sufficient to fully satisfy the scenario, and that it would degenerate into wanting ObjectReferences to give you all of the qualifiers necessary to distinguish one reference from another. For example if we had:

 struct Foo { object a; Bar b; }
 struct Bar { object c; }
 Foo[,] arr1 = new Foo[12,17];

If I understand you correctly, you ideally want to know that a particular reference represents arr1[5,6].b.c, but the API wouldn't be able to represent [5,6].b.c with a single number. As a partial solution/workaround for now its probably easier to simply rely on the ordering of ObjectReferences.

ww898 commented 6 years ago

b) You are having difficulty from unloaded types - I'd be curious to get more info from you on this one, such as a repro case where it occurs and the particular profiler APIs you call to determine that the type isn't loaded. I can speculate about what you are seeing, but we'd want to be certain before embarking on any work.

We collect statistics about field resolving. I promise to make the report about that.

Although I was hoping to find a path that avoids the dependency, I think its safe to rely on at least for the forseeable future. There is a risk that we might add references you didn't expect to the beginning or end of the list (such as the LoaderAllocator for collectible types), but I can't see any reason why we would ever need to change the relative order of references from different array indicies.

Well, could you please add the markers for virtual references in ICorProfilerCallback? I need to know that the reference to LoaderAllocator is virtual. I have the list of classes where the reference to LoaderAllocator isn't virtual. How can I understand after CLR/CoreCLR update that I have to add the new type in the list?

My concern going down this route is that it doesn't appear array index would be sufficient to fully satisfy the scenario, and that it would degenerate into wanting ObjectReferences to give you all of the qualifiers necessary to distinguish one reference from another. For example if we had:

If I understand you correctly, you ideally want to know that a particular reference represents arr1[5,6].b.c, but the API wouldn't be able to represent [5,6].b.c with a single number. As a partial solution/workaround for now its probably easier to simply rely on the ordering of ObjectReferences.

Yes, you are right. It would be great when I can easily associate reference from array or object with specific field. This way surely creates some additional problems. For example: how can I get parent fields for three nested structures (f.f.p)? How can get ClassId for three nested structures?

struct f0 { public object p; }
struct f1 { public f0 f; }
struct f2 { public f1 f; }
var f = new f2[45, 23, 56];
f[11, 13, 17].f.f.p = new object();

I think that the array element index will realy help, because there are a lot large arrays with references in applications, but structures or classes with hundreds or thousands fields with references are very rare (except generators).

On my opinion, there is the simple way to convert the element index from array's memory to the logical single or multidimensional indexes surely with GetArrayObjectInfo. So, I don't see any troubles with it. For example:

var f0 = new Foo[17]
f0[11] = new Foo(); // the index in memory is 11

var f1 = new Foo[12,17]
f1[7,11] = new Foo(); // the index in memory is 17*7+11=130

var f2 = new Foo[21,12,17]
f2[8,7,11] = new Foo(); // the index in memory is 17*12*8+17*7+11=1762

On the other side our profiler doesn't translate element memory indexes to logical indexes - this is the UI work.

noahfalk commented 6 years ago

Just continuing to think about this a bit, what if we put the memory offset in the object where we read the reference from into the callback information (or a sentinel value -1 if the reference was implicitly computed and has no backing memory). For example the callback might return:

offset              reference
-------            -----------
-1                  0x81238310
8                   0x37123788
16                  0x12338128
32                  0x123098f0

Using GetArrayObjectInfo, IsArrayClass, and GetClassLayout you should be able to determine the memory ranges where all the top-level array elements reside, even if you are unable to decipher the layout of their inner fields within that range. Together with the offsets you could now quickly bucket all the references that correspond to each array element using a single pass through the results.

A few reasons I suggest offsets vs. other ways of describing the references:

The GC, which is doing this enumeration for us, doesn't know the type information. It is working from a very minimal descrription of a type that only encodes which offsets are references and which offsets are not. It doesn't know what fields/array indicies any given offset corresponds to.
The offset grows in strength as an identifier the more type layout information you have. When you only know the type is an array with some top level element type of a certain size then you can calculate which array index you are within. However as you learn more (lets assume we resolve the issues in the other layout APIs so you can properly disect nested generic structures) then you would be able to calculate that a particular offset means exactly the reference backing [8,7,11].f.f.p.
The offset is easy to pass and manipulate. We can create a parallel array of the same size as references, or use a single array of <reference, offset> tuples. Other options (that aren't limited in precision) would presumably be variable length sequences such as a string "[8,7,11].f.f.p" or a list API that returned elements 8,7,11,f,f,p. These require variable sized memory allocations per-reference and building additional object models or data format parsers.

ww898 commented 6 years ago

@noahfalk Cool! That's what I need!

davmason commented 3 years ago

Closing since it seems the question was answered