dotnet / TorchSharp

A .NET library that provides access to the library that powers PyTorch.
MIT License
1.41k stars 182 forks source link

Enable DisposeScope for PackedSequence, resolves #1385 #1386

Closed mvphelps closed 1 month ago

mvphelps commented 1 month ago

Can now create a PackedSequence within a NewDisposeScope, and the .MoveToOther, .MoveToOuter, etc. Also can use disposeScope.Attach(PackedSequence).

Looking for feedback if we want a common interface to share between Tensor and PackedSequence. This would eliminate a bunch of type checks. I followed the existing pattern for this initial commit. This is my first PR here, LMK if I'm not conforming to repo norms. Thank you.

mvphelps commented 1 month ago

@dotnet-policy-service agree

mvphelps commented 1 month ago

TestLoadJIT_3 looks like it has a flaky history. Want me to try some troubleshooting? Do it here or in another PR? Thanks.

yueyinqiu commented 1 month ago

TestLoadJIT_3 looks like it has a flaky history. Want me to try some troubleshooting? Do it here or in another PR? Thanks.

don't worry. it also fails in other PRs https://github.com/dotnet/TorchSharp/pull/1383#issuecomment-2417021097


By the way would it be a better idea to modify PackedSequence itself instead of modifying the dispose scope system?

Those tensors are created in PackedSequence's constructor and the constructor is called under the scope. I think that's why they will be disposed. Maybe easily detaching them from the scope could solve the problem?

(Just a suggestion. Maybe I have missed something.)

mvphelps commented 1 month ago

@yueyinqiu Thank you for such a quick response. I certainly don't see this PR as final. I will ignore the JIT test going forward thank you.

I did consider that option - only modifying the PackedSeqeunce object, and I conceded it would work. My concern with this is that a PackedSequence is made of tensors. They can be detached, but then the user still needs to manage disposing the PackedSequence in order to dispose those tensors. This is the same as saying the user must rely on only techniques 1 and 2 as discussed in the Memory Management document. This would also not follow existing patterns already in the code base. For example, DataLoader uses scope to manage any temporaries created in GetTensor an collate, see https://github.com/dotnet/TorchSharp/blob/1dd3ae507b0881bfb41c00c185438c7884f6a59f/src/TorchSharp/DataLoader.cs#L434. This is a good pattern as it aleves the user of managing any temporaries.

The opposite question - why should PackedSequence not be on par with the memory management treatment of Tensor? I think it has all of the same use cases as far as lifetime goes.

My particular use case, my sequence data won't all fit in RAM all at once. So I'll use a custom data loader (based on the built-in one) that deals with PackedSequence instead of Tensor. I'd like to keep with that implementation as it does manage the temporaries via scope. This does necessitate the ability to move the PackedSequence to the outer dispose scope after it is created.

Are there specific concerns with modifying the DisposeScope object? I certainly can agree there would be a slight performance penalty due to the type checking. It does also represent a rather ugly bit of code duplication. That's why I mentioned an interface - that way these objects can be treated identically under the scope system, and there would be no branches or performance penalty. And now that I look I see that I missed a check on IsInvalid in DisposeScope (line 246) - that's what I get for creating the duplication!

I think the best solution is to implement a common interface on both Tensor and PackedSequence, and have DisposeScope only accept that. It can then treat these two types consistently, with no performance penalty, no code duplication, all while not foisting on to the user the responsibility of passing around disposables to manage. The interface would be pretty simple: IUnmanagedTorchReference{ OwningDisposeScope {get; set;} IsInvalid {get;}} (better name suggestions welcome!)

What problems/concerns do you see with this approach? I am open to options here. Note I implemented as I did for this PR as a middle road solution to start the conversation and get my idea communicated, without going so far as introducing a new type. Just trying to tread lightly as I meet the community here. Thanks again!

yueyinqiu commented 1 month ago

Wow! Really appreciate your detailed explanation. I got your point.

A common interface would be a great idea, which could even allow users to manage their objects with our dispose scopes. (Currently it is possible to attach a IDisposable, but there isn't a automatically changed OwningDisposeScope. And we could add extension methods on it. Really nice.)

By the way, IsInvalid seems to be used for statistics purpose (for unit test) alone. Is it really important?

@NiklasGustafsson

mvphelps commented 1 month ago

Great thank you. Added a new commit with what I'm thinking. I did do this one as what I consider release quality. Feel free to shred it @yueyinqiu, @NiklasGustafsson !

The new interface is

IDisposeScopeClient: IDisposable
    DisposeScope? OwningDisposeScope
    bool IsInvalid

As per your question, is IsInvalid important? Yes:

  1. It is already public on Tensor, and therefore part of the API so therefore can't be removed.
  2. Although it is only used to calculate statistics, those statistics are also public. Since they are exposed for the user, they need to be correct and accurate - therefore any object participating the scope system needs to accurately report it's validity.
  3. As a consumer of this library I do find this property useful when troubleshooting my memory management setup, so I do see the utility for the user.

In DisposeScope, everything that used IDisposable now uses IDisposeScopeClient. I tried leaving some of them (like the DisposablesView), but it was resulting in casting that looked out of place. Additionally I've removed the branches I initially added for PackedSequence, and retained the original code that worked with Tensor exactly as it was - though without the pattern match. It looks much cleaner this way imho.

I've removed my tests from TestNNUtils and instead introduced two new test files, one specifically for PackedSequence with DisposeScope, and another just for custom user objects using DisposeScope. All tests are passing.

About a minor limitation with a consumer implementing IDisposeScopeClient. They will not be able to register their object with the scope system directly as Tensor and PackedSequence do, since those are calling the internal DisposeScopeManager.ThreadSingleton. I thought exposing that was going too far. However, the user can still use scope.Attach, and that is reflected in the unit test for this. I also included instruction about this in the doc comments on IDisposeScopeClient. This keeps the API surface area smaller while still supplying the user what they need.

DataLoader also received the IDisposeScope to IDisposeScopeClient conversion.

I am aware the contribution guide says I also need to update release notes, but I haven't yet until I get more feedback on whether you find this contibution worthy of including. Also the release notes don't reference a future version, so I'm not sure how to add that. Do I just assume one version ahead and add a header?

NiklasGustafsson commented 1 month ago

@mvphelps, this is not a trivial update, so I'll have to take a look at it and ponder it, especially the comments about avoiding type checks in the discussion above.

NiklasGustafsson commented 1 month ago

Also the release notes don't reference a future version, so I'm not sure how to add that. Do I just assume one version ahead and add a header?

Yes, and update the version number in build/BranchInfo.props. Somebody already did the former in another PR, so if you could do that in this PR, that would be awesome.

NiklasGustafsson commented 1 month ago

Hmm... not (yet) sold on the separate interface.

Maybe it's worth breaking this up into two PRs -- one to add PackedSequence to DisposeScope-managed types, and one to then introduce IDisposeScopeClient (which is an odd name, by the way).

mvphelps commented 1 month ago

@NiklasGustafsson agree with being careful here. I have questions:

  1. By "one to add PackedSequence to DisposeScope-managed types", do you mean something like my prior commit, that had the typed branches? This is certainly a functional and minimal solution and will get me what I need. If yes, shall I push that here? Then I can hold the interface changes in another branch until the first part gets merged then open the second PR. Unless you are OK with a PR dependent on another PR.
  2. What specific concerns do you have RE the separate interface? I concede that it may represent a larger surface area of API that you don't want to add. What else?
  3. Open to better names for the interface. I chose that name, as it reflects that the implementer is a client of the DisposeScope system. DisposeScope is effectively the "server" side of that relationship. I think the name is accurately descriptive, but it is not aesthetic.
  4. I'm not clear on your release notes response. I think what you said is that someone else has already bumped the version number, and I need to duplicate that here, is that correct? In that case it looks like I need to reference pull "Fix #1356"? I would update my BranchInfo.props to be identical to that, then add a header to the release note for this version number (since that PR doesn't change the release notes). Is that correct? Let's of course find consensus on what I'll be implementing first.
NiklasGustafsson commented 1 month ago

By "one to add PackedSequence to DisposeScope-managed types", do you mean something like my prior commit

Yes, exactly.

What specific concerns do you have RE the separate interface?

I want us to be careful about the API surface area, introducing new interfaces, etc. Let's consider that separately, since it's not just about getting support for PackedSequence, which I think is an obvious good.

I think the name is accurately descriptive, but it is not aesthetic.

Client/Server has too many connotations associated with distributed systems, so I think that word choice will be confusing or just seem odd (as it did to me).

I'm not clear on your release notes response.

You asked about the right way to do it -- bump the patch version and add new record(s). This time, someone already beat you to it, but didn't edit BranchInfo.props. So, do that in your PR, please.

mvphelps commented 1 month ago

@NiklasGustafsson Reverted the interface, fixed a bug I had, and made IsInvalid on PackedSequence internal. This did necessitate validating it via a reflection call in my test (see TestDisposeScopesPackedSequence), but I thought it worth it to avoid exposing this in the API for now. I also thought the reflection call better than using the InternalsVisibleTo attribute. Also updated the release notes.

Additionally made a test to verify the statistics are updating correctly and have identified a subtle possible bug. In DisposeScope.Attach(IEnumerable), lines 241 and 246 are conditionals for decrementing DetachedFromScopeCount. However, since they are checking if the object is not owned by any scope, it is subtracting, so I get a negative number. I think these branches can actually just be removed as AddToOther downstream correctly increments this count. Additionally AFAIK this should be an additive only statistic and subtracting doesn't make sense for it. I did try commenting these lines and running the tests, everything passes except for the test I added to verify the statistic counting. This effects Tensor also. Thoughts? If you agree this is a bug would you like this addressed in a separate PR? I can also ignore this for now.

mvphelps commented 1 month ago

Just added one more took a walk and realized my implementation was dumb - as a DisposeScope was managing 5 objects for each PackeSequence. This update just detaches all the internal tensors and lets the PackedSequence lifetime manage them since it is already IDisposable.

NiklasGustafsson commented 1 month ago

Never Ignore... :-)

Easiest to track in separate bug, fix it later, since it has no functional impact and affects other APIs, too.

Shold 'IsInvalid' be 'IsValid' instead? It seems unusual to have express it in terms of invalidity instead of validity. On Tensor, it's 'IsValid,' I believe.

mvphelps commented 1 month ago

Ok I'll make an issue for that.

I don't like IsInvalid either. I was maintaining consistency with Tensor, which uses IsInvalid: https://github.com/dotnet/TorchSharp/blob/538f2e3a817ac02b5bcc35f1bdc8acbc2af08ba8/src/TorchSharp/Tensor/Tensor.cs#L99

@NiklasGustafsson Shall I update PackedSequence to do IsValid, or leave it to maintain consistency? We also can't change Tensor since that is long standing in the API. Unless we want to [Obsolete] it.

NiklasGustafsson commented 1 month ago

Well, I stand corrected. Leave it 'IsInvalid' for consistency.

NiklasGustafsson commented 1 month ago

Ready to merge?

mvphelps commented 1 month ago

I do not have any remaining concerns. If you think this is ready, let's merge it! :1st_place_medal:

mvphelps commented 1 month ago

@NiklasGustafsson color me amazed! :astonished: I never expected to see this merged in just a day! I appreciate your quick responses and insights. I am very impatient (working on it) and you didn't make me feel like I was waiting. I look forward to contributing more in the future. Starting with the statistic issue next week. THANK YOU!!!!

NiklasGustafsson commented 1 month ago

Let's see how the CI/CD pipeline goes. This morning, there was some problem with signing packages.

mvphelps commented 1 month ago

@NiklasGustafsson You are right it failed on signing again - 500 internal server error. Is there any way I can contribute to help move this along? I'm assuming not as it's likely something only maintainers have access for, but my help is available if needed.

NiklasGustafsson commented 1 month ago

I've asked for internal help, and still waiting. The error isn't easy to diagnose, exactly. There must be some log, somewhere, that can be consulted.

NiklasGustafsson commented 1 month ago

Apparently, the cert used to sign has expired. Getting a new one.

NiklasGustafsson commented 1 month ago

Success!

mvphelps commented 1 month ago

Nice! When will it show up on Nuget? I don't see it available - I've even enabled prerelease packages. Does it just take a while?

Also it looks like maybe the build skipped the publishing part? See https://dotnet.visualstudio.com/TorchSharp/_build/results?buildId=110780&view=results, the push*packages targets appear all skipped.

mvphelps commented 1 month ago

The build status on the ReadMe page still shows v0.103.0. I don't think the build went through. It did some signing but it hasn't published any packages.

NiklasGustafsson commented 1 month ago

That's because I manually publish the packages when there's a critical mass of PRs that have gone into it.