Open henryjj99 opened 5 years ago
That seems doable - I'll see if I can prototype it soon, and will let you know. Thanks for the request!
Really appreciate it! Thanks
Hi Henry -
I've attached a prototype as part of a new Impala GHA, along with an example/test file to go with it. The component is called 'ParRemDupLns'. It seems to be performing relatively well (a 'random' test does 300000 lines in under a second on my machine, but results may vary depending on the system. In any case it should be more tuned than the sequential version.)
A couple things to note: it reorders the lines coming through (similarly to other RemoveDuplicateLine implementations). A smaller (e.g more precise) tolerance will result in a faster computation speed. The 'granularity' parameter just affects how it batches its work into parallel portions. This should ONLY affect runtime, not the computed result. The optimal value for this depends on the system, so feel free to adjust to whichever works fastest on your machine (as a rule of thumb I've found 500-1000 are decent.)
Additionally, it behaves (slightly) differently than the original version in cases where there are many (non-duplicate) lines of very similar length, in that it chooses to cull differently - the result should still be usable.
Just replace your current Impala GHA (or download from Food4Rhino - you'll still need the other .dll dependencies) with this one to get it rolling. If you could let me know how it works for you, I'd really appreciate any feedback!
Hi there:
Thank you for your work! Fantastic! It does work and it works really fast in my case. Screenshot is attached below. I am using New Surface Pro i5 7300u with 2 cores, 4 threads.
Best
Glad to hear it! I'll refine and test it a bit more and likely add it to the next version of Impala. Thanks for the suggestion!
In my recent project there is around 300000 lines to 'RemoveDuplicateLine' but the single thread func in Kangaroo2 is super slow. Hope I can do multi thread some day!