Open Nytelife26 opened 6 months ago
I hereby make use of my copyright and revoke the permission to use any of my code (commits & branches). Good thing is that nothing has been merged yet. I'm licensing it under a private license that specifically forbids usage in proselint.
i can respect your wishes there quite easily. i would, however, like to note note that this (along with your other recent behaviour) is quite unprofessional. additionally, under D.5 and D.6 of github's terms of service, as i am sure you're aware, that would not work if i were genuinely a bad actor.
I am the unprofessional one, of course. I just wanted to coexist and contribute. To stay with your top-notch analogy: The keys were handed to us both. You decided to change the locks. You made an overstepping and backstabbing move for power. I'll not be thrown under the bus. So i'll go and take my copyright with me. Read again through your paragraphs. I did not contribute - nothing was merged - and i rewoke my consent for any future merges of my work.
Thanks for respecting my wishes. bye
work will now resume following completion of my university exams for the year. current status is as follows:
porting efforts have been halted, and work is resuming in python for now. when i have completed the plans i have for proselint's current structure, they will continue.
however, many of these ideas came from deciding how best to implement proselint in other languages. i devised the following model for checks, which can be collected into a registry and dispatched sequentially or concurrently.
misc.illogic.conclusion
), with property computations for aforementioned path splitting and the name. optionally, this may be computed to ensure consistency and remove the need to manually normalise check namesignore_case
property, because this switch exists for all check typesoffset
tuple, because offsets exist for all check typesmetadata implementation of limitations, such as limit_results
and ppm_threshold
, remains to be determined. this may happen with flags.
the current registry implementation introduces a roughly ~100ms performance regression, which is far from ideal. i will aim to clean this up promptly. however, the desired granularity has been achieved: the addition of partial keys makes it possible to specify key components, like simply airlinese
, while the registry system makes it possible to skip checks on a per-function basis.
status report: i have made it up to proselint.checks.misc
so far with the new dispatch system. this will resolve many issues incurred by the provisional implementation, and with some planned additions, like the flag system and a context accessor for custom check functions that do not conform to one of the provided check types, it will represent the final version of proselint's internal check system for the foreseeable future. i aim to have this finished, pending testing, by the end of the week.
final edit:
links.broken
is still the slowest check to load, at ~20ms.
here are current benchmarks for proselint, from start to finish, with the demo file:
Benchmark 1 (uncached): python3 -m proselint --demo
Time (mean ± σ): 253.4 ms ± 11.2 ms [User: 667.4 ms, System: 139.2 ms]
Range (min … max): 238.1 ms … 271.2 ms 10 runs
Benchmark 2 (cached): python3 -m proselint --demo
Time (mean ± σ): 142.6 ms ± 4.6 ms [User: 126.5 ms, System: 15.1 ms]
Range (min … max): 137.6 ms … 159.5 ms 21 runs
This looks fantastic @Nytelife26! I see the last commit here is from July, is there much more that needs to be done?
is there much more that needs to be done?
I am in the process of porting proselint in its entirety, which became a necessity after an unfortunate conflict arose with the original author of the refactor. I see this as the best possible path forward for proselint - a fresh start, which will come with performance benefits, long-overdue housekeeping, and a good chance to implement some features people have been asking for for a long time.
This effort has not been easy, and while I'm back at university, things have been a hard balance. However, I have many of the internals done already (configuration, core parts of the command line, specification structures), and I feel good progress is being made.
I will be making this effort more public once I have a solid foundation in place. For now, the latest commits to this pull request mark the last Python version of the project, unless some major breakthrough in communication happens.
Let me know if you have any furher questions. As always, I am incredibly grateful and happy to see that people are still interested in proselint. Things were rough, with stagnant development after communications ceased some years ago, but I am excited to finally have the chance to revive this project.
Small victories are showing - the command line, configuration parser, and check primitives are operational. As an additional bonus, all of the check specifications will be evaluated and stored at compile-time, entirely eliminating runtime discovery costs from the Python version.
Shown here is the first test with an actual regex specification from the original code.
Some things are not yet possible for reasons beyond my control, like consistent case-insensitive matching without using hacky mode modifiers (blocked by fancy-regex#132). I also have yet to implement parallelization, but that will not be a priority until much of the other major work is complete; although, it should be as trivial as adding Rayon and adjusting the dispatch iterators.
I committed a preview version today so any onlookers can see how things are coming along. Be advised that at present, things are messy, quite inefficient, and there are still traces of Python-esque design patterns lying around.
However, results speak for themselves. With 51 of ~180 checks implemented, an uncached serial run in release mode is at least 10 times faster than the previous implementation of an uncached parallel run mode from my measurements. Assuming this performance will scale in a linear fashion with some pessimistic padding, I would expect no worse than 2 times faster.
nytelife26@[lilium-2] » proselint git:(dev) ± hyperfine
Benchmark 1: ./proselint-rs/target/debug/proselint check --demo
Time (mean ± σ): 529.2 ms ± 6.0 ms [User: 520.3 ms, System: 7.0 ms]
Range (min … max): 520.4 ms … 542.0 ms 10 runs
Benchmark 2: ./proselint-rs/target/release/proselint check --demo
Time (mean ± σ): 77.0 ms ± 4.8 ms [User: 73.9 ms, System: 1.9 ms]
Range (min … max): 70.3 ms … 85.2 ms 10 runs
Benchmark 3: pdm run proselint check --demo
Time (mean ± σ): 939.7 ms ± 15.3 ms [User: 1174.2 ms, System: 232.7 ms]
Range (min … max): 920.6 ms … 972.2 ms 10 runs
Warning: Ignoring non-zero exit code.
Summary
./proselint-rs/target/release/proselint check --demo ran
6.88 ± 0.43 times faster than ./proselint-rs/target/debug/proselint check --demo
12.21 ± 0.78 times faster than pdm run proselint check --demo
Updated results with a parallel iterator via rayon:
Benchmark 1: ./proselint-rs/target/debug/proselint check --demo
Time (mean ± σ): 128.8 ms ± 19.7 ms [User: 725.1 ms, System: 68.3 ms]
Range (min … max): 103.3 ms … 162.5 ms 10 runs
Benchmark 2: ./proselint-rs/target/release/proselint check --demo
Time (mean ± σ): 38.5 ms ± 3.2 ms [User: 76.1 ms, System: 33.7 ms]
Range (min … max): 33.2 ms … 43.4 ms 10 runs
Benchmark 3: pdm run proselint check --demo
Time (mean ± σ): 838.4 ms ± 17.4 ms [User: 1006.9 ms, System: 185.3 ms]
Range (min … max): 813.6 ms … 865.9 ms 10 runs
Warning: Ignoring non-zero exit code.
Summary
./proselint-rs/target/release/proselint check --demo ran
3.34 ± 0.58 times faster than ./proselint-rs/target/debug/proselint check --demo
21.76 ± 1.89 times faster than pdm run proselint check --demo
All check specifications are registered at compile-time. Things that remain to be done include message templating, output formats, deciding whether I'd like CheckType
to remain as an enum or become a trait to emulate the flexibility of Python's unions, implementation of a new ExistenceFancy
check type, and general housekeeping.
this is a follow on from #1361. credit to @orgua for the initial work here.
following a request that no work from the initial refactoring effort should be used, preserved below, the oxidation of proselint begins. it can be observed here that the time it now takes to launch proselint, parse CLI options, find config paths for both JSON and TOML, and deserialize them, is faster than it previously took just to load the CLI options. it is worth noting that the previous measurements were taken even after a highly optimised refactor that involved replacing click with a simplified parse function.
it may take some time to get this all up and running. i would like to thank any onlookers for bearing with me.
keeping this here so i don't forget: