drahnr / cargo-spellcheck

Checks all your documentation for spelling and grammar mistakes with hunspell and a nlprule based checker for grammar
Apache License 2.0
314 stars 32 forks source link

Non determinism with hunspell backend #319

Open drahnr opened 6 months ago

drahnr commented 6 months ago

Describe the bug

Hunspell appears to be non-deterministic, at times some words are found to be spelling mistakes - rightfully so - yet others

To Reproduce

Steps to reproduce the behaviour:

  1. Run cargo-spellcheck on a large rust project multiple times
  2. Receive different error sets detection-sets

Expected behavior

Found issue sets between runs should be identical

Screenshots

cargo spellcheck -vv check --code 77
[hunspell]
lang = "en_US"
search_dirs = [ "." ]
extra_dictionaries = [ "lingua.dic" ]
skip_os_lookups = true
use_builtin = true

[hunspell.quirks]
# He tagged it as 'TheGreatestOfAllTimes'
transform_regex = [
    # `Type`'s
    "^'([^\\s])'$",
    # 5x
    # 10.7%
    "^[0-9_]+(?:\\.[0-9]*)?(x|%)$",
    # Transforms'
    "^(.*)s'$",
    # backslashes
    "^\\+$",
    "^[0-9]*+\\s?k|MB|Mb|ms|Mbit|nd|th|rd|s$",
    # single char `=` `>` `%` ..
    "^=|>|<|%$",
    # 22_100
    "^(?:[0-9]+_)+[0-9]+$",
    # We use { a or b } for { p or q }.
    "^\\{(.*)\\}$",
]
allow_concatenation = true
allow_dashes = true
400
2D
A&V
accessor/MS
AccountId
activations
acyclic
adversary/SM
allocator/SM
annualised
anonymize/D
Apache-2.0/M
API
APIs
arg/MS
assignee/SM
async
asynchrony
autogenerated
backable
backend/MS
benchmark/DSMG
benchmarking
BFT/M
bitfield/MS
bitwise
blake2/MS
blockchain/MS
borked
broadcast/UDSMG
BTC/S
canonicalization
canonicalize/D
CentOS
CLI/MS
codebase/SM
codec/SM
collateralize/UXSD
collateralizes
collateralized
collateralization
commit/D
comparator
computable
conclude/UD
config/MS
could've
crowdfund
crowdloan/MSG
crypto/MS
cryptographically
CSM
Cucumber/MS
customizable/B
DDoS
Debian/M
decodable/MS
decrement
deduplicated
deduplication
deinitializing
dequeue/SD
dequeuing
deregister/SG
deregister
deregistration/s
deregistrations
deregistering
deserialize/G
DHT
disincentivize/D
dispatchable/SM
DLEQ
DM
DMP/SM
DMQ
DoS
DOT
DOTs
ECDSA
ed25519
encodable
enqueue/D
enqueue/DMSG
entrypoint/MS
enum
ERC-20
ETH/S
ethereum/MS
externality/MS
extrinsic
extrinsics
fedora/M
finalize/B
FIXME
FRAME/MS
FSMs
functor
fungibility
gameable
getter/MS
GiB/S
GKE
GNUNet
GPL/M
GPLv3/M
Grafana/MS
Gurke/MS
gurke/MS
Handler/MS
HMP/SM
HRMP
HSM
https
hostname
iff
implementer/MS
includable
include/BG
increment/DSMG
inherent
inherents
initialize/CRG
initializer
instantiate/B
instantiation/SM
intrinsic
intrinsics
invariant/MS
invariants
inverter/MS
invertible
io
IP/S
isn
isolatable
isolate/BG
iterable
jaeger/MS
js
judgement/S
kademlia
keccak256/M
keypair/MS
keystore/MS
Kovan
KSM/S
Kubernetes/MS
kusama/S
KYC/M
lib
libp2p/M
lifecycle/MS
liveness
lookahead/MS
lookup/MS
LRU
mainnet/MS
malus/MS
MB/M
Mbit
merkle/MS
Merklized
metadata/M
middleware/MS
Millau
misbehavior/SM
misbehaviors
misvalidate/D
MIT/M
MMR
modularity
monomorphization
mpsc
MPSC
MQC/SM
msg
multisig/S
multivalidator/SM
mutex
natively
NFA
NFT/SM
no_std
nonces
NPoS
NTB
observability
OCW/MS
offboard/DMSG
onboard/DMSG
oneshot/MS
onwards
OOM/S
OPENISH
others'
ourself
overseer/MS
ownerless
p2p
parablock/MS
parachain/MS
ParaId
parameterization
parameterize/D
parathread/MS
parametrize/BS
participations
passthrough
PDK
peerset/MS
permission/D
pessimization
phragmen
picosecond/SM
PoA/MS
polkadot/MS
Polkadot/MS
PoS/MS
PoV/MS
PoW/MS
PR
precheck
prechecking
preconfigured
preimage/MS
preopen
prepend/G
prevalidating
prevalidation
preverify/G
programmatically
prometheus/MS
protobuf
provisioner/MS
proxy/DMSG
proxy/G
proxying
PRs
PVF/S
README/MS
redhat/M
register/CD
relayer
repo/MS
requesters
reservable
responder/SM
retriability
reverify
ROC
roundtrip/MS
routable
rpc
RPC/MS
runtime/MS
rustc/MS
SAFT
scalability
scalable
Schnorr
schnorrkel
SDF
SDK
sending/S
sharding
shareable
Simnet/MS
spawn/SR
spawner
sr25519
SS58
SSL
stake/MSUR
startup/MS
stateful
str
struct/MS
subcommand/SM
submitter/SM
submitters
submitter's
substream
subsystem/MS
subsystems'
supermajority
SURI
sybil
systemwide
taskmanager/MS
TCP
teleport/D
teleport/RG
teleportation/SM
teleporter/SM
teleporters
template/GSM
testnet/MS
tera/M
teleports
timeframe
timestamp/MS
TODO
topologies
TorchScript
tradeoff
transitionary
trie/MS
trustless/Y
TTL
tuple/SM
typesystem
ubuntu/M
UDP
UI
unapplied
unassign
unconcluded
unfinalize/B
unfinalized
union/MSG
unordered
unreceived
unreserve
unreserving
unroutable
unservable/B
untrusted
untyped
unvested
URI
utilize
v0
v1
v2
validator/SM
ve
vec
verifier
verify/R
versa
Versi
version/DMSG
versioned/U
VMP/SM
VPS
VRF/SM
w3f/MS
wakeup
wakeups
warming/S
wasm/M
wasmtime
WebSocket/S
Westend/M
wildcard/MS
WND/S
Wococo
WS
XCM/S
XCMP/M
yeet
yml
yaml
zombienet
zsh
solomon
reed

Please complete the following information:

Additional context

Sporadic!

drahnr commented 2 weeks ago

Hack: Use https://crates.io/crates/zspell which still lacks suggestions, and only then invoke hunspell to extract the suggestions.