I've been developing some data corruption algorithms (inspired by the documentation from https://dmm.anu.edu.au/geco/flex-data-gen-manual.pdf but not looking at the sourcecode, since it has an unusual license), and I wonder if your excellent project would be interested in some pull requests to incorporate python implementations in your recordlinkage.datasets submodule.
I'm imagining methods such as corrupt.ocr_noise(s : str) -> str. If this sounds of interest, I can put together a PR or use this ticket to further discuss the design. And if this is beyond the scope of what you want for your module, I understand!
I've been developing some data corruption algorithms (inspired by the documentation from https://dmm.anu.edu.au/geco/flex-data-gen-manual.pdf but not looking at the sourcecode, since it has an unusual license), and I wonder if your excellent project would be interested in some pull requests to incorporate python implementations in your
recordlinkage.datasets
submodule.I'm imagining methods such as
corrupt.ocr_noise(s : str) -> str
. If this sounds of interest, I can put together a PR or use this ticket to further discuss the design. And if this is beyond the scope of what you want for your module, I understand!