c-w / gutenberg

A simple interface to the Project Gutenberg corpus.
Apache License 2.0
322 stars 59 forks source link

Native support for complex queries #14

Closed c-w closed 9 years ago

c-w commented 9 years ago

Originally reported by @MasterOdin in #11

I would argue that it would be important for get_etexts to not only support one feature/value pair but potentially multiple.

A potential use case that illustrates this would be how would I get only the german books by the author "Various".

The solution as proposed by your API would be:

texts = get_etexts('author','various')
final_list = []
for text in texts:
    if get_metadata('language', text) != 'german':
        pass
    final_list.append(text)

which is somewhat weird as I'd kind of like the API to handle this internally (especially if I want to get even more specific with criteria and don't want to build up that if statement!)

So I'd say maybe change get_etexts to support passing in either two strings for one feature, or probably easier, a dictionary which would allow for any number of arguments:

texts = get_etexts({'author':'various','language':'german'})
c-w commented 9 years ago

Following e6add03, get_etexts is super fast. This means that native support for complex queries is no longer very important as we can achieve the same effect with multiple calls to get_etexts and set operations, e.g. the query above can be re-formulated as get_etexts('author', 'various') & get_etexts('language', 'german').