alirezanet / Husky.Net

Git hooks made easy with Husky.Net internal task runner! 🐢 It brings the dev-dependency concept to the .NET world!
https://alirezanet.github.io/Husky.Net/
MIT License
632 stars 29 forks source link

Improve CSX execution time #28

Closed alirezanet closed 1 year ago

alirezanet commented 2 years ago

Details

Currently, CSX execution is not so efficient it takes about 2 seconds on the first run which is not ideal, I opened this issue to evaluate possible options to solve this problem. the expected behavior is to minimize the Roslyn compiler warming up process.

the minimum execution time for the bellow command currently is about 1-2 seconds.

dotnet husky exec <fileNme.csx>
acesyde commented 2 years ago

Hi @alirezanet

IMO you can try to compile scripts when you detect a change and only execute them when it's required, you can minimize the execution process time by doing this

I don't know if it's possible it's just an idea πŸ˜„

alirezanet commented 2 years ago

Hi @alirezanet

IMO you can try to compile scripts when you detect a change and only execute them when it's required, you can minimize the execution process time by doing this

I don't know if it's possible it's just an idea πŸ˜„

Hi, Thank you for the suggestion, I agree. Compiling scripts probably is the best solution that comes to mind but I didn't find any documentation of Roslyn APIs that can be used yet although I didn't spend enough time searching.

Xemrox commented 2 years ago

Heya. Joining the discussion. More concrete solution would involve a hash that relates to the source. Something like $script.csx -> $script.$hash.dll? Btw writing the assembly to a file can be done with: compilation.Emit(...)

acesyde commented 2 years ago

I checked it quickly

var cachePath = System.IO.Path.Combine(workingDirectory, ".cache");
var ouputDllPath = System.IO.Path.Combine(cachePath, "output.dll");
var outputPdbPath = System.IO.Path.Combine(cachePath, "output.pdb");

if (!Directory.Exists(cachePath))
{
    Directory.CreateDirectory(cachePath);
}

if (File.Exists(ouputDllPath) && File.Exists(outputPdbPath))
{
         using var ilstream = new MemoryStream();
         using var pdbstream = new MemoryStream();
         using var ouputDllPathStream = new FileStream(ouputDllPath, FileMode.Open);
         ouputDllPathStream.CopyTo(ilstream);
         using var outputPdbPathStream = new FileStream(outputPdbPath, FileMode.Open);
         outputPdbPathStream.CopyTo(pdbstream);

         var assembly = Assembly.Load(ilstream.GetBuffer(), pdbstream.GetBuffer());
         var type = assembly.GetType("Submission#0");
         var factory = type.GetMethod("<Factory>");
         var submissionArray = new object[2];
         var task = (Task<object>)factory.Invoke(null, new object[] { submissionArray });
         await task;
         "Command executed.".Log(ConsoleColor.Green);
         return;
}
else
{
         using var ouputDllPathStream = new FileStream(ouputDllPath, FileMode.OpenOrCreate);
         using var outputPdbPathStream = new FileStream(outputPdbPath, FileMode.OpenOrCreate);
         var emitResult = compilation.Emit(ouputDllPathStream, outputPdbPathStream);
}

It seems to be better, but I need to spend more time to benchmark it πŸ˜„

image

Todo :

Thks for the idea @Xemrox

alirezanet commented 2 years ago

Interesting ... I think we don't need a global cache folder and we can use .husky\_\bin folder to store the compiled cache files also we can save the files using its hash (crc32 or md5 etc) as fileNames to keep track of changes. but this solution needs a cleanup command.

Xemrox commented 2 years ago

@acesyde why do you copy the FileStreamto a MemoryStream? You could make use of Assembly.LoadFile(absolutePath) instead. @alirezanet sha512 is the standard for hashes so go for that instead of md5. My proposal is a $scriptName.hash file containing the hash of the source that lead to the latest assembly. This way if the hash differs you can safely assume the assembly is out of sync and regenerate it

acesyde commented 2 years ago

@Xemrox When I said fast it was fast, now it's time to clean up and improve πŸ˜ƒ

alirezanet commented 2 years ago

@Xemrox what do you mean by SHA512 is standard?
In our case using SHA algorithms is unnecessary because they are slower and more CPU intensive than something like crc32 or md5. the thing is we don't need a secure hash algorithm here. anything with a checksum works fine. but when we implemented this feature we can do some benchmarking.

image

SHA1 looks good tho

acesyde commented 2 years ago

@alirezanet with benchmarkdotnet

image

alirezanet commented 2 years ago

Nice. I'm not sure it performs better but can you include also Crc32.NET? because here we just care about performance. It could be faster.

acesyde commented 2 years ago

Not a huge difference, IMO using the official cryptographic library instead of an external lib may be better for the future

Method Mean Error StdDev Ratio Gen 0 Allocated
TestCRC32 5.605 us 0.2339 us 0.6822 us 1.00 0.0610 386 B
TestMd5 5.992 us 0.1182 us 0.1407 us 1.00 0.0610 426 B
TestSHA1 6.089 us 0.1216 us 0.1623 us 1.00 0.0687 450 B
TestSHA256 6.340 us 0.1250 us 0.2848 us 1.00 0.0763 490 B
TestSHA512 8.064 us 0.1564 us 0.2193 us 1.00 0.2289 1,453 B
// * Warnings *
MultimodalDistribution
  HashBench.TestCRC32: .NET 6.0 -> It seems that the distribution is bimodal (mValue = 3.86)
Xemrox commented 2 years ago

We are responsible to avoid collision attacks. It's not about the performance of md5 vs sha512. If you hash the code and compile the assembly we must be sure to execute it only if the source matches our fingerprint of that assembly. Thinking further we only store a source fingerprint and no assembly fingerprint that relates to that...Β Am 01.03.2022 10:26 schrieb AliReZa Sabouri @.***>: @Xemrox what do you mean by SHA512 is standard? In our case using SHA algorithms is unnecessary because they are slower and more CPU intensive than something like crc32 or md5. the thing is we don't need a secure hash algorithm here. anything with a checksum works fine. but when we implemented this feature we can do some benchmarking.

β€”Reply to this email directly, view it on GitHub, or unsubscribe.Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>

alirezanet commented 2 years ago

I get your point but even hashes like SHA-256 are SHA-512 are not collision-free, also in our case the source code (CSX) is available for the attackers and we don't share or commit compiled versions anywhere. so IMO thinking about security is a little bit over-engineering. let's say you wanna somehow attack this system, how it is possible when you don't have any access to other contributors' compiled files. I'm not saying using SHA-512 is a bad choice, on the contrary, I personally have a strong interest in safer algorithms but I don't see any real-world vulnerabilities yet. But to show my respect to your opinion about this I'll probably use SHA-512 since its overhead is barely noticeable.

acesyde commented 2 years ago

@alirezanet maybe in the future we can add the capability to consume shared libraries from another repo (like pre-commit) so the SHA-512 can be a solution

alirezanet commented 2 years ago

You both voted for SHA-512, hands up πŸ™Œ πŸ˜‰. so SHA-512 it is. πŸ˜…

acesyde commented 2 years ago

But in the case of "pre-commit" they use an external database (sqlite) to store all this information

alirezanet commented 2 years ago

What do you think about the default behavior? should exec command automatically compile the scripts for future executions? or we should add an option like --compile to change its behavior? In the end, we need one of these options probably --raw or --compile

acesyde commented 2 years ago

Like you say a --raw or --no-cache option seems to be better, compilation must be the default behavior

Xemrox commented 2 years ago

I'll go with --no-cache. Exhibit good default behaviour and if anybody needs to recompile every time they have a switch for it

acesyde commented 2 years ago

So i made a benchmark between compiled vs actual solution and the compiled version is a little bit faster 🀣

Method Mean Error StdDev Ratio Allocated
UseCSharpScript 29,607.01 us 548.193 us 457.766 us 1.000 6,404,152 B
UseCompiledScript 78.44 us 1.516 us 2.269 us 0.003 768 B

The csx file only contains a Console.WriteLine

alirezanet commented 1 year ago

added in v0.5.0.