Improve Python.NET interop performance

lostmsu commented 5 years ago

What does this implement/fix? Explain your changes.

This improves the performance of Python -> C# calls by up to 50%.

Offset calculation for instance unwrapping

Commits 3344110 and 0d158f5 can actually be squashed into one change. They addresses the following scenario:

A C# object a is created and filled with some data.
a is passed via Python.NET to Python. To do that Python.NET creates a wrapper object w, and stores reference to a in one of the its fields.
Python code later passes w back to C#, e.g. calls SomeCSharpMethod(w).
Python.NET has to unwrap w, so it reads the reference to a from it.

Prior to this change in 4. Python.NET had to determine what kind of an object a is. If it is an exception, a different offset needed to be used. That check was very expensive (up to 4 calls into Python interpreter).

The change replaces that check with computing offset unconditionally from the object size it reads from the wrapper.

Replacing `Marshal.Read/Marshal.Write` methods in hot paths 12d1378

Marshal class methods provide safe way to read and write primitive values from unmanaged memory. Safety involves two checks:

An exception handler for NullReferenceException in case pointer is null: the contract, provided by Marshal class requires that to be converted to AccessViolationException.
Checking if the primitive is properly aligned in memory, and using a different procedure to access it otherwise.

Both are expensive, and not needed in Python.NET

Python.NET never handles exceptions from Marshal.
All allocations in Python are aligned, as it uses standard C allocator, that ensures alignment for large structures (e.g. PythonObject instances).

So a couple of methods are added to Util class to perform faster unmanaged memory access, and Marshal calls replaced with them in hot paths (mostly in CLRObject constructor).

Does this close any currently open issues?

N/A

Any other comments?

N/A

Checklist

Check all those that are applicable and complete.

[ ] Make sure to include one or more tests for your change (N/A: no new features)
[ ] If an enhancement PR, please create docs and at best an example (N/A: no new features)
[x] Add yourself to AUTHORS
[x] Updated the CHANGELOG

lostmsu commented 5 years ago

It would be nice to add more comments for the different changes so its easier to understand and assert that the new implementation is equivelent to the previous one.

I don't think it makes sense to do in the code, as the old code will be gone after making this change. Clarified a bit in the review itself though.

As for any performance change, testing and measuring is the key stone, it would be good to add some testing information and results to the PR showcasing the improvements. Also noting how were these changes tested windows/linux, with what etc.

I only tested on a Windows box with sample data from https://github.com/QuantConnect/Lean/tree/master/Data . As suggested by Jared, I run random data generator to fill the date range 2010-2018.

I put the following into config.json, and run Launcher.

// algorithm class selector
"algorithm-type-name": "C:\\Users\\SCRAMBLED\\Projects\\QuantConnect\\Lean\\Algorithm.Python\\Benchmarks\\IndicatorRibbonBenchmark",

// Algorithm language selector - options CSharp, FSharp, VisualBasic, Python, Java
"algorithm-language": "Python",

//Physical DLL location
"algorithm-location": "../../../Algorithm.Python/Benchmarks/IndicatorRibbonBenchmark.py",

Without the changes I got ~13k dps, and ~17k dps with the changes.

I have also tested (initially) by adding a 128 iteration loop around the body AlgorithmPythonWrapper.OnData to reduce the effect of data loading on overall run time, and got a decrease from ~35s to ~25s run time as reported by the Launcher (about 5s in both these numbers are spent in data reading).

Wondering if these changes have been presented at https://github.com/pythonnet/pythonnet ?

I will upstream them as soon as I have time for that.

P.S. Completed answering to this pass of review, ready for a new iteration.

lostmsu commented 5 years ago

@Martin-Molinero dropping Marshal changes improves performance? That looks suspicious.

Martin-Molinero commented 5 years ago

@Martin-Molinero dropping Marshal changes improves performance? That looks suspicious.

I'd rather say, based on these tests results, that its exposing the fact that the Marshal changes didn't improve performance significantly. There is clearly a dispersion in the result values.

lostmsu commented 5 years ago

@Martin-Molinero I removed the Marshal-related change, and squashed the rest to a single commit.

lostmsu commented 5 years ago

@Martin-Molinero I recommend merging it using Squash and Merge, otherwise a merge commit will be created. Or manually cherry-pick c70724a

mchandschuh commented 5 years ago

food for thought -- It's great seeing the macro-level performance tests, but this repository may also benefit from micro-level speed tests. This will help us track impacts of performance changes on individual operations. For example, this PR seems to directly impact performance of invoking a member (method/property/etc) on a native C# object that has been returned from the python runtime. We could have a performance test that specifically measures this case. Also, we would be able to make better informed decisions regarding improvements on the micro-scale, such as the Marshal changes (since removed)

lostmsu commented 5 years ago

@mchandschuh , I created a project in the main Python.NET repo to track benchmarking: https://github.com/pythonnet/pythonnet/projects/5 Since there's no existing infra for it, I think it is out of scope for this change.

Done with this iteration. Note: I haven't made any code changes. If you strongly believe some are required, please respond to one of the conversations.

lostmsu commented 5 years ago

Hey guys! Any changes required for this to get in?

jaredbroad commented 5 years ago

Various team members getting back from vacation and travels - will be able to test and get in soon.

Martin-Molinero commented 5 years ago

Merging so we start cloud integration and performance testing. Note: adding a new merge commit to follow existing merging policy.

QuantConnect / pythonnet