dotnet / csharplang

The official repo for the design of the C# programming language
11.37k stars 1.02k forks source link

Proposal: temp allocation #757

Closed ygc369 closed 7 years ago

ygc369 commented 7 years ago

C# should allow people to allocate temp objects which would be freed soon. For example:

static void foo()
{
    string str;
    TempAllocationBlock  //may use other syntax
    { //all objects allocated within this block are temp objects.
        string s=new string(' ', 1000); //s is a temp string
        ...... //do something with s, but s can't escape out of this block
        str=s; //compile error or throw exception! temp object is trying to escape!
    }//free the memory of s here! 
    /*When the code runs out of this block, all the memory allocated within this block should be freed at once, need not wait for GC!*/
}
Joe4evr commented 7 years ago

And how would you expect the runtime to clear that memory without doing a GC?

ygc369 commented 7 years ago

@Joe4evr .Net managed heap has free memory list too, just like C/C++, although it is not used as frequently as C/C++. For example, if there is pinned memory during GC, then the free heap memory after GC might be incontinuous, so a free memory list is needed to manage the incontinuous free memory. So the answer to your question is to use the free memory list.

svick commented 7 years ago

@Joe4evr I think that's the easy part. If you know for sure that a group of objects can be safely freed at a specific point of time, you can use an arena to manage their memory.

@ygc369 The general idea might sound enticing, but I have a hard time understanding how exactly would it work and how useful would it be in practice. For example, consider this code:

Person[] GetAllPersonsByName(string name)
{
    return this.persons.Where(p => p.Name == name).ToArray();
}
ygc369 commented 7 years ago

@svick The compiler/runtime can check whether the returned object references any of the temporary objects before return.

HaloFour commented 7 years ago

In my opinion this belongs on the list of potential optimizations of the JIT, not of the language. C# has no concept of a free memory list.

ygc369 commented 7 years ago

@HaloFour But C# should provide a syntax to express temp allocation block.

iam3yal commented 7 years ago

@ygc369 Why should it provide a syntax for it?

ygc369 commented 7 years ago

@eyalsk Because people must tell the compiler/runtime which allocations are temp. Do you mean that the compiler should find temp allocations by itself without people's help? Can it be so smart? I doubt.

CyrusNajmabadi commented 7 years ago

@ygc369 How would this work?

CyrusNajmabadi commented 7 years ago

Specifically: what would the compiler actually do differently with this sort of code?

HaloFour commented 7 years ago

@ygc369

But C# should provide a syntax to express temp allocation block.

I disagree. Such a thing, if it would exist, is an implementation detail of a CLR. As of now it doesn't exist in the CLR at all, let alone officially supported with public metadata in a manner that can be influenced by any of the CLR languages, let alone C#.

Because people must tell the compiler/runtime which allocations are temp.

I'd suspect that the JIT would be more than capable of detecting when a local never escapes and could be safely collected upon exiting a method (blocks don't exist in IL and local slots are always scoped to the method). However, exactly what use is a variable that doesn't escape? What useful code could you possibly execute on s in your code that doesn't involve the reference being passed somewhere outside of that method?

ygc369 commented 7 years ago

@HaloFour Many algorithms need temp space to store intermediate results, but once they get final results, the temp space would be useless and should be freed at once. This is the usage of the proposal. I think your idea about JIT detecting temp allocation should be called "enhanced escape analysys".

HaloFour commented 7 years ago

@ygc369

Many algorithms need temp space to store intermediate results, but once they get final results, the temp space would be useless and should be freed at once. This is the usage of the proposal.

Garbage collection handles that scenario just fine. If you're so obsessed about forcing the issue you can always set your references to null mid-method and call GC.Collect(0) explicitly.

iam3yal commented 7 years ago

@ygc369 If you can't mutate something and you can't get anything in and out of the block why do you need to allocate memory to begin with? can you name few of these algorithm? what's the problem with the following approach:

void Main()
{
    using (Scope s = new Scope())
    {
// Can't really do anything with s... 
    }
}

class Scope : IDisposable
{
    public void Dispose()
    {
        // Do something...
    }
}
svick commented 7 years ago

@ygc369

The compiler/runtime can check whether the returned object references any of the temporary objects before return.

That sounds a lot like generational garbage collection: You have a small set of objects (Gen 0) and a subset of those objects that shouldn't be collected (roots). You walk references from the objects that shouldn't be collected to check which objects they reference (mark stage) and then deallocate the rest (sweep stage).

Are you basically asking for explicitly delimited "gen -1"?

Joe4evr commented 7 years ago

Also, remember that in designing the CLR, the GC flow is intentionally left as an implementation detail. If you don't like that the GC is non-deterministic like that, then go to C/C++/any other unmanaged language where you can control literally every allocation and de-allocation yourself.

But please just stop proposing all these silly micro-optimizations just because the GC doesn't behave in the way that you imagine it should.

At the very least follow what you've been told time and again by now: Provide real-world performance data that shows beyond a shadow of a doubt that your proposed changes are worth the amount of time, effort, and money for Microsoft to change the existing system.

SunnyWar commented 7 years ago

@Joe4evr I take issue with your tone. It's not constructive. Please stop. Personally, I think the proposal is worth some thought. Just because the GC works in a certain way today doesn't preclude it from working different (better) in the future. I've had apps that have no choice but to call a function that allocates a block of memory and then throws it away a few instructions later, and this in a long-running tight loop. In this scenario the GC goes crazy with Gen 0 collections for millions of objects that logically have no reason to live so long. Explicitly setting them to null and forcing a collection just makes it worse. I've also heard is said, "if you need it faster, write it C/C++". This is great practical advise in the short term but my assumption is that these discussion about finding way to make .Net better. In which case such statements are not helpful.

CyrusNajmabadi commented 7 years ago

Again: what is the proposal here? What are you asking the C# compiler to actually do?

CyrusNajmabadi commented 7 years ago

Many algorithms need temp space to store intermediate results, but once they get final results, the temp space would be useless and should be freed at once.

You can usually accomplish this with pooled objects. That way you can have your temp values that you work with, but you don't churn the GC heavily as you can take those intermediate objects and put them in your pool instead of needing to have them be collected.

CyrusNajmabadi commented 7 years ago

Just because the GC works in a certain way today doesn't preclude it from working different (better) in the future.

GC proposals are probably not great to be here. This is the repo for the C# language. Details of the GC aren't really under our purview.

CyrusNajmabadi commented 7 years ago

This is great practical advise in the short term but my assumption is that these discussion about finding way to make .Net better.

In your example, i woudl advise changing that function that you are calling to behave differently. For example, instead of allocating data that then needs to be freed, it could be passed in the location to place or fill results. Then you could allocate once and it would not have to worry about that detail.

If you can't change that function then it's highly unlikely that any proposal here would actually help with this.

SunnyWar commented 7 years ago

@CyrusNajmabadi

i woudl advise changing that function that you are calling to behave differently

Ya. I'll get right on that...getting the .Net Framework and dozens of third party libraries to change their functions to behave differently. Ha!

SunnyWar commented 7 years ago

@ygc369 Note this issue was closed, I think, not because you idea has no merit but because it's not a language issue. Be sure to post you ideas to the coreclr for recommendations on better garbage collection.

CyrusNajmabadi commented 7 years ago

Ya. I'll get right on that...getting ... functions to behave differently. Ha!

I don't understand what the proposal is. What do you think the .net framework could do differently here. Presumably if you wanted to use these constrained regions, you'd have to code in a very particular way (in order to participate properly with whatever restrictions it puts on your code). But... in the example given... you're calling out to 3rd party code. Code that expects to be run in a GC environment and would very likely not behave properly in such a new restrictive system. So again can someone please tell me what the request is, and how it would work, and how it would solve the problems laid out here?

getting the .Net Framework functions to behave differently

Actually, this is definitely something you can do. Take a look at Stephen's post here https://blogs.msdn.microsoft.com/dotnet/2017/06/07/performance-improvements-in-net-core/ See how many performance improvements have been provided by the community. If you are having problems with .Net perf, then definitely put in the effort to address the issues. The PRs will get accepted, and it will be far more likely to happen, and far more likely to help than trying to get the language and runtime to change significantly here.

--

Now, if you do want the language/runtime to change, it absolutely must be with a clearer description of what the proposal would be. I still can't tell what is being asked for here, so there's no way to make any forward progress on something like this.

ygc369 commented 7 years ago

@SunnyWar Thank you. But I don't dare to open new issue now, could you post similar ideas to the coreclr repo if you need it? I was hurt by Joe4evr's words. If I open new issue in the coreclr repo, he might ridicule me again. I don't like that.

CyrusNajmabadi commented 7 years ago

@ygc369 @SunnyWar Could you provide information about your actual proposal? I"ve looked at the links to other proposals that you've made and they all seem to be lacking any sort of data or detail. To make a change or introduce a new .net memory model or programming model you absolutely must provide more information or else none of these issues are going to go anywhere.

ygc369 commented 7 years ago

@CyrusNajmabadi The idea is to use some syntax to tell the compiler/runtime which allocations are temp and can be deallocated once going out of scope. Thus these memory can be reused before next GC.

CyrusNajmabadi commented 7 years ago

how would this work in your example of calling into some other function? How do you ensure that what you are doing would not be unsafe?

ygc369 commented 7 years ago

@CyrusNajmabadi The purpose of this idea is to reduce the GC pressure and the garbage living time. If the compiler/runtime is so smart that it can find temp allocations without people's help, then no new syntax is needed and that's OK too.

CyrusNajmabadi commented 7 years ago

Can you explain your proposal more (and answer my question). What is the way that you would accomplish this?

ygc369 commented 7 years ago

@CyrusNajmabadi

how would this work in your example of calling into some other function? How do you ensure that what you are doing would not be unsafe?

The JIT/CLR should ensure that. The rule is that non-temp objects can't reference any temp objects. Throw an exception when the rule is broken.

CyrusNajmabadi commented 7 years ago

Ok. let me try a different tactic. Imagine if i said "i want the runtime to do some sort of analysis and not allocate when it isn't necessary". Sure... that sounds great. But how does it do that? It's not enough to ask for these types of things, you have to actually explain how it would work. Be specific. What does the compiler and language do differently? How does the runtime actually implement this? How do we ensure that this doesn't break stuff.

CyrusNajmabadi commented 7 years ago

The rule is that non-temp objects can't reference any temp objects.

So you can't pass any of this data to anything else? For example, any method out there might end up taking any parameters and passing it to some referencing non-temp object.

ygc369 commented 7 years ago

@CyrusNajmabadi The non-temp objects can't reference any temp objects, but temp objects can reference each other and non-temp objects.

So you can't pass any of this data to anything else? For example, any method out there might end up taking any parameters and passing it to some referencing non-temp object.

I can pass value type at least.

HaloFour commented 7 years ago

These "temp" objects are just instances of normal reference types. Literally any method you could call, either instance or static, in literally any library anywhere, could take that reference and assign it somewhere. There's no way that the compiler nor the runtime can prevent another root from being established on these references and that would prevent the "temp" object from being collectable.

alrz commented 7 years ago

I think this is related to https://github.com/dotnet/roslyn/issues/161. It does actually propose to "destruct" objects once they're out of scope. and that in itself requires some kind of ownership as safety rules -- partly https://github.com/dotnet/csharplang/issues/421 though that doesn't propose to free memory once out of scope, just that we make sure that every object has a single owner to enforce immutability across aliases.

Joe4evr commented 7 years ago

@SunnyWar @ygc369

I take issue with your tone. It's not constructive. I was hurt by Joe4evr's words.

I do apologize for that, and I should've chosen my words a bit more carefully. Cyrus' posts have said it a lot better: If you only say "I want X!" without any clear explanation of exactly how that should be accomplished, it just feels like you're being this guy: Don't be this guy.

SunnyWar commented 7 years ago

@ygc369 Please take a look at these discussion on CoreCLR.

https://github.com/dotnet/coreclr/issues/1784 and https://github.com/dotnet/coreclr/issues/430

tannergooding commented 7 years ago

@SunnyWar, @ygc369

If I understand the proposal you are making, you are essentially asking for a form of stack allocated objects that are scoped to a specific block.

There are several proposals on the topic already of stack allocated objects already (including one filed by @ygc369: https://github.com/dotnet/csharplang/issues/240) and I think (if I understood the proposal) that this is merely an extension on top of that.

@CyrusNajmabadi could confirm, but I believe this is one of the issues where C# cannot do anything actionable until the CLR is modified to provide basic support for this functionality (e.g. more PRs like https://github.com/dotnet/coreclr/pull/6653 need to be done).

ygc369 commented 7 years ago

@tannergooding Your understanding is generally right. But this proposal is a bit different from stack allocated objects, because stack can't store large objcets. Temp objects in this proposal could be stored on the heap if they are big, but they can be collected before GC, just like stack allocated objects.

CyrusNajmabadi commented 7 years ago

@ygc369 How would this work? How does your proposal make sure that it's safe to collect these objects? For example, if you wanted to "temp allocate" an object, then how would that work? First, how do you ensure that the constructor of the object doesn't leak 'this' to anything? Second, how do you ensure that any of the method you might pass this object to, or any of the methods you might call on this object, don't capture the object?

Can you give an example of how this would work and how you could ensure it would be safe?

CyrusNajmabadi commented 7 years ago

For example, in your code, you have this:

//do something with s, but s can't escape out of this block

What can you actually do with 's'? Any method call on 's' could cause it to escape. Can you provide an actual scenario where this would be useful and actually explain the proposal in more detail? So far i have not really seen any answers (sufficient or otherwise) for the questions i've asked you.

SunnyWar commented 7 years ago

@tannergooding I'm not suggesting anything. I'm only trying to get the jerks on this thread to open their minds a bit and consider the possibility of improving the status quo. That's why I pointed people at various proposals on the coreclr, so that they can familiarize themselves with other, similar, discussions.

CyrusNajmabadi commented 7 years ago

@CyrusNajmabadi could confirm, but I believe this is one of the issues where C# cannot do anything actionable until the CLR is modified to provide basic support for this functionality

We'd need an actual proposal specifying how this would work. It would need to be clear about what changes to the language would be wanted, and how they would map to any underlying CLR functionality for the compiler implementation. I've now looked through each and every issue linked from this issue and i can't find anything actually concrete or actionable. It appears to be a lot of very high level request (like "we should support reference counting") without actually providing any sort of additional detail past that.

These types of issues will not go anywhere in their current form. It would be akin to me saying "i want C# to be a cloud computing language" and then providing no detail on what that actually means or what it would actually entail to get there.

@ygc369 If you are interested in this space (which certainly seems like the case given several messages from you asking if there has been any progress), then i would recommend supplying a little more information and design to help move things forward. Right now, opening an issue, and then hoping others will pick it up and figure it out isn't really going to be effective.

ygc369 commented 7 years ago

@CyrusNajmabadi Now I try to answer your core question----"how would this work?" I think it need a write barrier to check whether temp objects escape. In my example, when the code goes into TempAllocationBlock, a write barrier should start to work, checking every reference assignment. I know that write barriers impact performance, so whether to do it needs tradeoff.

ygc369 commented 7 years ago

@CyrusNajmabadi Some people are more radical than me. They don't want to use a syntax to tell the compiler/runtime which allocations are temp, but they want the compiler/runtime to find temp allocations by itself automatically. It's a bit like escape analysys. I support this idea too. But if you ask me how to do this, I really don't know. Java has escape analysys feature, why not to look at Java's implemention?

CyrusNajmabadi commented 7 years ago

Java has escape analysys feature, why not to look at Java's implemention?

Can you point to the java language feature that relates to escape analysis?

ygc369 commented 7 years ago

@CyrusNajmabadi

Can you point to the java language feature that relates to escape analysis?

I can only find Chinese documents about it. As you know, the GFW(Great Firewall of China) makes it difficult to search English documents in China. I can't even use Google.

HaloFour commented 7 years ago

@ygc369

@CyrusNajmabadi's claim is that escape analysis is not a part of the Java language in that there is no syntax to opt-into using it or to influence it directly. It's an optional optimization feature of the runtime.

iam3yal commented 7 years ago

@CyrusNajmabadi Wiki says this:

The popularity of the Java programming language has made escape analysis a target of interest. Java's combination of heap-only object allocation, built-in threading, and the Sun HotSpot dynamic compiler creates a candidate platform for escape analysis related optimizations (see Escape analysis in Java). Escape analysis is implemented in Java Standard Edition 6. Some JVMs support a stronger variant of escape analysis called partial escape analysis that makes scalar replacement of an allocated object possible even if the object escapes in some paths of a function.

class Main {
  public static void main(String[] args) {
    example();
  }
  public static void example() {
    Foo foo = new Foo(); //alloc
    Bar bar = new Bar(); //alloc
    bar.setFoo(foo);
  }
}

class Foo {}

class Bar {
  private Foo foo;
  public void setFoo(Foo foo) {
    this.foo = foo;
  }
}

In this example, two objects are created (commented with alloc), and one of them is given as an argument to a method of another. The method setFoo() stores a reference to a received Foo object. If the Bar object was on the heap then the reference to Foo would escape. But in this case a compiler can determine, with escape analysis, that the Bar object itself does not escape the invocation of example(). Which implies that a reference to Foo cannot escape either. So the compiler can safely allocate both objects on the stack.

But really the only thing that baffles me here is why we need a new syntax for it? a new form of block that I still don't understand why the need to allocate memory if nothing gets in and out of it, maybe someone, smarter than me can explain it to me, do I miss anything?