Closed alirezanet closed 1 year ago
Hi @alirezanet
IMO you can try to compile scripts when you detect a change and only execute them when it's required, you can minimize the execution process time by doing this
I don't know if it's possible it's just an idea π
Hi @alirezanet
IMO you can try to compile scripts when you detect a change and only execute them when it's required, you can minimize the execution process time by doing this
I don't know if it's possible it's just an idea π
Hi, Thank you for the suggestion, I agree. Compiling scripts probably is the best solution that comes to mind but I didn't find any documentation of Roslyn APIs that can be used yet although I didn't spend enough time searching.
Heya. Joining the discussion.
More concrete solution would involve a hash that relates to the source.
Something like $script.csx -> $script.$hash.dll?
Btw writing the assembly to a file can be done with:
compilation.Emit(...)
I checked it quickly
var cachePath = System.IO.Path.Combine(workingDirectory, ".cache");
var ouputDllPath = System.IO.Path.Combine(cachePath, "output.dll");
var outputPdbPath = System.IO.Path.Combine(cachePath, "output.pdb");
if (!Directory.Exists(cachePath))
{
Directory.CreateDirectory(cachePath);
}
if (File.Exists(ouputDllPath) && File.Exists(outputPdbPath))
{
using var ilstream = new MemoryStream();
using var pdbstream = new MemoryStream();
using var ouputDllPathStream = new FileStream(ouputDllPath, FileMode.Open);
ouputDllPathStream.CopyTo(ilstream);
using var outputPdbPathStream = new FileStream(outputPdbPath, FileMode.Open);
outputPdbPathStream.CopyTo(pdbstream);
var assembly = Assembly.Load(ilstream.GetBuffer(), pdbstream.GetBuffer());
var type = assembly.GetType("Submission#0");
var factory = type.GetMethod("<Factory>");
var submissionArray = new object[2];
var task = (Task<object>)factory.Invoke(null, new object[] { submissionArray });
await task;
"Command executed.".Log(ConsoleColor.Green);
return;
}
else
{
using var ouputDllPathStream = new FileStream(ouputDllPath, FileMode.OpenOrCreate);
using var outputPdbPathStream = new FileStream(outputPdbPath, FileMode.OpenOrCreate);
var emitResult = compilation.Emit(ouputDllPathStream, outputPdbPathStream);
}
It seems to be better, but I need to spend more time to benchmark it π
Todo :
Thks for the idea @Xemrox
Interesting ... I think we don't need a global cache folder and we can use .husky\_\bin
folder to store the compiled cache files also we can save the files using its hash (crc32 or md5 etc) as fileNames to keep track of changes. but this solution needs a cleanup command.
@acesyde why do you copy the FileStream
to a MemoryStream
? You could make use of Assembly.LoadFile(absolutePath)
instead.
@alirezanet sha512 is the standard for hashes so go for that instead of md5. My proposal is a $scriptName.hash file containing the hash of the source that lead to the latest assembly. This way if the hash differs you can safely assume the assembly is out of sync and regenerate it
@Xemrox When I said fast it was fast, now it's time to clean up and improve π
@Xemrox what do you mean by SHA512 is standard?
In our case using SHA algorithms is unnecessary because they are slower and more CPU intensive than something like crc32 or md5. the thing is we don't need a secure hash algorithm here. anything with a checksum works fine.
but when we implemented this feature we can do some benchmarking.
SHA1 looks good tho
@alirezanet with benchmarkdotnet
Nice. I'm not sure it performs better but can you include also Crc32.NET? because here we just care about performance. It could be faster.
Not a huge difference, IMO using the official cryptographic library instead of an external lib may be better for the future
Method | Mean | Error | StdDev | Ratio | Gen 0 | Allocated |
---|---|---|---|---|---|---|
TestCRC32 | 5.605 us | 0.2339 us | 0.6822 us | 1.00 | 0.0610 | 386 B |
TestMd5 | 5.992 us | 0.1182 us | 0.1407 us | 1.00 | 0.0610 | 426 B |
TestSHA1 | 6.089 us | 0.1216 us | 0.1623 us | 1.00 | 0.0687 | 450 B |
TestSHA256 | 6.340 us | 0.1250 us | 0.2848 us | 1.00 | 0.0763 | 490 B |
TestSHA512 | 8.064 us | 0.1564 us | 0.2193 us | 1.00 | 0.2289 | 1,453 B |
// * Warnings *
MultimodalDistribution
HashBench.TestCRC32: .NET 6.0 -> It seems that the distribution is bimodal (mValue = 3.86)
We are responsible to avoid collision attacks. It's not about the performance of md5 vs sha512. If you hash the code and compile the assembly we must be sure to execute it only if the source matches our fingerprint of that assembly. Thinking further we only store a source fingerprint and no assembly fingerprint that relates to that...Β Am 01.03.2022 10:26 schrieb AliReZa Sabouri @.***>: @Xemrox what do you mean by SHA512 is standard? In our case using SHA algorithms is unnecessary because they are slower and more CPU intensive than something like crc32 or md5. the thing is we don't need a secure hash algorithm here. anything with a checksum works fine. but when we implemented this feature we can do some benchmarking.
βReply to this email directly, view it on GitHub, or unsubscribe.Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>
I get your point but even hashes like SHA-256 are SHA-512 are not collision-free, also in our case the source code (CSX) is available for the attackers and we don't share or commit compiled versions anywhere. so IMO thinking about security is a little bit over-engineering. let's say you wanna somehow attack this system, how it is possible when you don't have any access to other contributors' compiled files. I'm not saying using SHA-512 is a bad choice, on the contrary, I personally have a strong interest in safer algorithms but I don't see any real-world vulnerabilities yet. But to show my respect to your opinion about this I'll probably use SHA-512 since its overhead is barely noticeable.
@alirezanet maybe in the future we can add the capability to consume shared libraries from another repo (like pre-commit) so the SHA-512 can be a solution
You both voted for SHA-512, hands up π π. so SHA-512 it is. π
But in the case of "pre-commit" they use an external database (sqlite) to store all this information
What do you think about the default behavior? should exec
command automatically compile the scripts for future executions? or we should add an option like --compile
to change its behavior? In the end, we need one of these options probably --raw
or --compile
Like you say a --raw
or --no-cache
option seems to be better, compilation must be the default behavior
I'll go with --no-cache
. Exhibit good default behaviour and if anybody needs to recompile every time they have a switch for it
So i made a benchmark between compiled vs actual solution and the compiled version is a little bit faster π€£
Method | Mean | Error | StdDev | Ratio | Allocated |
---|---|---|---|---|---|
UseCSharpScript | 29,607.01 us | 548.193 us | 457.766 us | 1.000 | 6,404,152 B |
UseCompiledScript | 78.44 us | 1.516 us | 2.269 us | 0.003 | 768 B |
The csx file only contains a Console.WriteLine
added in v0.5.0
.
Details
Currently, CSX execution is not so efficient it takes about 2 seconds on the first run which is not ideal, I opened this issue to evaluate possible options to solve this problem. the expected behavior is to minimize the Roslyn compiler warming up process.
the minimum execution time for the bellow command currently is about 1-2 seconds.