Closed asmeurer closed 10 years ago
OK, this is ready to go from my end.
I generated some text from Anthem (which appears to be the only Ayn Rand book on Project Gutenberg) for various values of n
:
n=1 'They point' 'stopped our feet+} every beam had' 'grant us with my hands, even that all our eyes were afraid' 'truth which neither slaves nor buried under their eyes upon our'
n=2
'It is a plain, and' 'body stopped, as if our spine had been touched' 'if ever we surrender it, we shall die with you' 'silent against the' 'we fought with our hands trembled under the ground'
n=3 'in+} our heart there is the first peace we have known in twenty years' '"' '{+Here,+} on this mountain, I and my sons and my' 'Tomorrow, you will take us back into your' 'we lived not, when we toiled for our'
n=4
'We go to our' 'We pulled the heavy' "There is nothing to take a man's freedom away from him," 'So we' 'south, as far as our eyes could see' 'Sweeper'
Ignore the abrupt endings at the ends of the strings. That's due to the way you have it terminating with the sample
, which I did not change, and also the hard text wrapping of the source text which I did not bother to fix. The {+
stuff is coming from the Project Gutenberg formatting.
From what I can tell:
n=1: Usually not even grammatically correct. Tends to be a word salad. n=2: Grammatically correct, but silly enough that it's probably not from the source material n=3 and n=4: Seem to be mostly sentences from the source material.
Feel free to do your own experiments on your own favorite source text.
Though I understand why it needs to be there, I don't like the idea of littering the global namespace with db_factory
, one_dict
and one
(as people using this module will see this lying around too), and I'm somewhat confused why you decided to implement that behavior in the Exception type.
However, your patch set is definitely an improvement over existing functionality.
Thanks for the contribution :)
I did the thing in the exception because I didn't know it was already done. You can remove it.
Can you push a release to pypi?
You can rename those functions with underscores in the front to make them private of you want.
By the way, in case it wasn't obvious, this totally breaks existing databases. I don't know if you want to remedy that somehow.
In that case, I'll remove the Exception code and rename the functions.
I haven't pushed the release to pypi yet because there's still some other things I want to add/fix before pushing a db-breaking update. (namely, the restriction to using integer math (which would again change db format) and the ability to add sentences to an existing database) I'm still thinking about how I should handle backwards-compatibility to older databases, and before I've made a decision there, I'd rather not break people's code ;)
It should be straightforward to convert an old style database to a new-style one.
This fixes https://github.com/TehMillhouse/PyMarkovChain/issues/9.
It isn't finished yet, so don't merge. I've only implemented the generation of the data. I haven't written the text generation parts yet.