erkin / ponysay

Pony rewrite of cowsay.
GNU General Public License v3.0
1.22k stars 81 forks source link

-q is slow #127

Open maandree opened 11 years ago

maandree commented 11 years ago

I have noticed that the ponyquotes is slow, this combined with the documented fact that GNU Bash is slow (I guess Bash very common) makes it horrible for using when the shell starts. And probably infeasible for legacy computers.

I do not think there is anything wrong with the implementation, but 3.0 will enhance it based on pony file meta data. Hope fully it will fixed

If 3.0 does not fix this, the two things that can be done. Either profile it (if there is nothing obviously bad in the implementation, which I do not know how to do in python, but probably it is not too hard) the other possibility is one thing I may do anyway just for fun and to learn Lua, and that is port ponysay.py (only that file) to Lua and call it lunasay. [Lua should faster and use less memory but a bit more difficult.]

JotaRandom commented 11 years ago

lunasay...you make may day XD

but slow, what is spected for 304 quotes...

in all cases...put the quotes on the pony-metadata can make more slow that a separate archive in a separate directory

maandree commented 11 years ago

Lunasay, the name is just so perfect (and obvious enough). Lu[n]a = Moon, Luna is a Pony.

I did think about some idea's earlier, but this is none I think really is good:

304, and that just the best quotes, imagine if we included every good quote.

etu commented 11 years ago

Hmm, why write a helperprogram in lua? It feels both dumb and stupid to mix languages and increase amount of deps again.

So I have to drop my suggestion here, a sqlite database containing all quotes.

maandree commented 11 years ago

No, not a helperprogram. I want to learn lua, so my idea is just to try to do a lua port of ponysay. But only the ponysay.py. A plus of this is if I am able to do that (if it is not just a too unfitting language) there will be a lightweight version (that naturally will not always be up to speed.)

SQLite may be nice. However, multiple databases will be needed. A concern I have with this is, whether the majority of users, fill find it simple to modify (not just add) is personal quote database, without us having to make a toolkit for it. Altough, does that really, matter; will somepony want to do that without knowing how, and without just submitting it to this repo instead.

maandree commented 11 years ago

»Build a fast indexed file whether everything» versus SQLite.

My guess is that the first option is simpler, same performace, but that SQLite will be faster at modifing.

etu commented 11 years ago

Well, nobody want the database as binary file in the git tree. (I hope).

So we could have the quotes as files in the git-tree, and build a database on "compile"?

maandree commented 11 years ago

Naturally. And may celestia banish anypony what would want buildable binaries in the git tree for any other reason than helping the user get started or reduce make dependencies.

JotaRandom commented 11 years ago

Ok in all cases the PDF is not already a binary archive on the git-tree?? I know are different cases, but a SQLite database IS a new makedep for the Data Base

Off: and Luna is spanish for Moon (as same Sombra is for Shadow)

maandree commented 11 years ago

Ok in all cases the PDF is not already a binary archive on the git-tree?? Not sure what you are asking about, but the only thing binary in the git-tree is the PDF just to help people get started from the source if they do not know what to do, as well as the ttyponies. I personally have not tested util-say on fully free software.

Yes, it is a new make dependency, but is is also runtime dependency.

Luna is Latin (and may other languages). Lua is Portuguese [and sound really nice for a programming language]. Sombra [new for me] is spanished derived from umbra which is Latin.

Adding binaries is not nice, but binary archives [which SQL database would be] is just bad in any case it is used.

Ahh... poor Pinkie Pie, I knocked the desktop to hard and she fell of the monitor [when due I am still using CRT:s], hit the UPS, and fell into a lot of dust.

etu commented 11 years ago

Well, correct me if I am wrong, but python got a sqlite module. In Gentoo, it's a useflag for python itself... And we could use python to build the database from the source files.

Also, python modules aren't that big and sqlite is common.

JotaRandom commented 11 years ago

Yep but in my Fast search is for python2 only and another but for the entire mysql (I going to investigate it a little more)

maandree commented 11 years ago

It is an optional dependency (but it does not say what it is used for in arch) by python 3. Optional dependencies should be (well, are usally) considered dependencies for program language interpreters.

maandree commented 11 years ago

I am implementing this with dbm which is faster, more trivial, enough function, and has no dependencies. But it has optional dependencies such as gdbm (everypony have gdbm) for maximum performance.

jaseg commented 11 years ago

Instead of using a database you could perhaps consider cleaning up your code instead. In my fork where I dropped some unnecessary stuff and rewrote everything else, -q is about as fast as withouth -q.

maandree commented 11 years ago

Using dbm makes the code both cleaner and faster, because you just open() use the database as i dictionary and then close(). I will take a look at your fork later, I am not working too much as ponysay right know; I will look at it when I am begin working on ponysay more consistently again.

maandree commented 11 years ago

Havn't you removed a bunch of functionally in your fork and only use the essentials?

jaseg commented 11 years ago

Yep, I think that's a feature^^

maandree commented 11 years ago

Naturally (generally I agree with that), but sometimes you want everything.

JotaRandom commented 11 years ago

as far I can read in the dbm webpage, that tool is deprecated, and the author (in github https://github.com/hilbix/ ) promised update the code to githuib but intead not do that and not update dbm

so I thing that dbm is or forget by uptream, or feature complete and bug free or deprecated by upstream

maandree commented 11 years ago

dbm as in the unix database library: https://en.wikipedia.org/wiki/Dbm , http://docs.python.org/3.3/library/dbm.html

jaseg commented 11 years ago

"Today the World, Tomorrow the Solar System" ;)

Naturally (generally I agree with that), but sometimes you want everything.

I think this is true concerning quality, not quantity. The Unix philosophy tells us that it is better to have a tool for every task that does this task very well instead of one tool that does everything but does not do all things that well.

jaseg commented 11 years ago

Also, since it is evidently possible to do -q in a performant manner without using a full-fledged database in the backend I think using a full-fledged database is covering up the problem instead of solving it.

[edit] For reference: the pretty un-optimized code in my fork runs on my machine (AMD Fusion E450 with 2*1.8GHz) with -q in ~350ms which I think is acceptable for shell startup and similar tasks. [/edit]

JotaRandom commented 11 years ago

@maandree so, sorry..i because, already exist a program called dbm and i misunderestand you, sorry

maandree commented 11 years ago

@jristz Assumed that, that package should really get a better name, you cannot call your program X if it interacts with the library X.

JotaRandom commented 11 years ago

@maandree right agree, but now is deprecated so no-one care now

maandree commented 11 years ago

@jaseg Well on a desktop you should not notice anything, but run it one a heavily lockdown old netbook and having it start up when you start a new bash session with a lot of stuff. Also Python has only a 300th of the perfomance of C.

dbm is not a full-fledged database, it is the absolute minimum you could imagine. It is just precaching the quotes database, which allows you to just use it as a simple dictionary. [Edit: okay, not the absolute minimum, the minimum is just a list file, this also has an index to speed up the process.]

Additionally it solves the problem where the random selection is bias to pinkie because she hade to be split into three files be quotes because the the long resulting filename; I cannot se any better solution for this.

Finally using a dictionary file allows use to have two dictionary files and 3 random modes. Creating a dictionary that maps pony file → (master file, number of quotes) and a dictionary that maps master file → (pony file, number of quotes)*; lets use either randomly select a pony and prioritise the images appears with the same likelyhood, that the quotes appears with the same likelyhood, or that the the ponies (not the images) appears with the same likelyhood.

jaseg commented 11 years ago

@maandree I am using a netbook.

Concerning quotes: Can you really not see a better solution for storing a few hundred lines of text than an embedded database?! óÒ Have a look at https://github.com/jaseg/ponysay/blob/master/ponysay.py#L33

What "random modes" do you think of and why would they be necessary?

maandree commented 11 years ago

The quotes are not being stored the the database.

»What "random modes" do you think of and why would they be necessary?» Not necessary, but it only takes 6 extra times. (They are not implement nor in an issue report.) If not implement the code is of same size.

Edit: Well except there is a build process, and it helps us group togather pony files.

jaseg commented 11 years ago

What, then, is the purpose of the database?

What "random modes" would you like to implement? I can only see the obvious one where you take a random quote of the selected pony.

maandree commented 11 years ago

The purpose is the have a lookup table from pony files to quote files. Currently this is done by having pinkiepie+pinkiebounce+.{0,1,3,...}, and all files names the pony name is search for in these files names. You have solved this by creating symlinks for each name and having all quotes in the same files.

In the solution, the files currently in /ponyquotes/ are installed, except /ponyquotes/ponies which no longer exists, instead it is built from the metadata in the pony files. The difference in the amount of work making script for creating the database and a script for making symlinks instead is minimal; and I really think not needing to maintain the alias is what we should go for, which both would offer. It wouldn not matter that much which solution to use, except coding with the database makes it easy to implement the following three algorithms for selecting quote:

+Q pony :: pinkie pie is as probable as sweetie belle. +Q file :: each pinkie pie image is as probable, but pinkie pie is more probable than sweetie belle because there are more pinkie pie images then there are sweetie belle images. +Q quote :: all quotes are as probable.

Creating a database requires to collection the same information as making symlinks. Creating the database given the required information takes 4 lines, making the symlinks takes 2 lines.

jaseg commented 11 years ago

Creating a database for storing 100s of lines of text is one additional layer on top of the file system to do something the file system is perfectly capable of doing without any help. Also, I do not think the python dbm module works with python package resources.

The three behaviors you mentioned are trivial in any case, though I do not see any need for ① and ③.

JotaRandom commented 11 years ago

In any case, @maandree you can do a test whit one quote to look if is fast as you say or not? the only way to test the fast-ish in this case is a real test, maybe the win can by infinitecimal comparyng whit others ideas, like the one used now in jaseg-ponysay or vice versa