Review simon branch (OpenLibrary and Composite data sources)

GoogleCodeExporter commented 8 years ago

Purpose of code changes on this branch:

When reviewing my code changes, please focus on:

After the review, I'll merge this branch into:
/trunk

Original issue reported on code.google.com by charlie....@gmail.com on 1 Aug 2010 at 3:42

GoogleCodeExporter commented 8 years ago

At first blush the OpenLibraryDataSource looks very nice (thanks!). The 
CompoundDataSource looks good too (though it might be nice to be able to set 
the order as a preference if using the compound)? 

I can't tell if these are used by setting a pref and switchable in the app 
though? I'll look at it more and see if I can figure it out (just running the 
app, I can't switch -- I'll try to wire these in).

Original comment by charlie....@gmail.com on 1 Aug 2010 at 10:03

GoogleCodeExporter commented 8 years ago

You can select the data source in the Prefs :-) All is the compound class.

Setting the order to search is a good idea, at the moment it loops through the 
array of providers in the xml file.

Original comment by iamsanch...@gmail.com on 1 Aug 2010 at 10:25

GoogleCodeExporter commented 8 years ago

I take it back, I see where it is used now. I was looking at the right code and 
running the wrong emul. My bad. I'll merge the provider stuff into the trunk 
and test there.

Original comment by charlie....@gmail.com on 1 Aug 2010 at 10:25

GoogleCodeExporter commented 8 years ago

I guess we posted that last comment there at the same time, yeah, I found it, 
was just looking at it wrong. 

Right now I have copied into trunk (don't want to merge because the tags stuff 
is in your branch too, and it's not ready for prime time yet) and I am 
refactoring a bit so that CompoundDataSource doesn't create the data sources 
every time getBook or getBooks is called (that's expensive). Other than that it 
looks great, and it should be in trunk later today. 

Thanks for the contribution (I'll add your name to contributors in about, if 
that's Ok with you?).

Original comment by charlie....@gmail.com on 1 Aug 2010 at 11:00

Changed state: Accepted

GoogleCodeExporter commented 8 years ago

Sounds great.

Thanks

Original comment by iamsanch...@gmail.com on 1 Aug 2010 at 11:08

GoogleCodeExporter commented 8 years ago

[deleted comment]

GoogleCodeExporter commented 8 years ago

Simon, your stuff, with my refactoring to create data providers only once at 
startup, or when user changes preference, is in trunk. 

Would you mind checking out trunk and doing a quick review, and then once I 
test it for a bit, and you sign off, I'll release it.  

thanks

Original comment by charlie....@gmail.com on 2 Aug 2010 at 12:32

Changed state: Fixed
Added labels: 1.0.17

GoogleCodeExporter commented 8 years ago

Original comment by charlie....@gmail.com on 2 Aug 2010 at 12:33

Changed state: Started

GoogleCodeExporter commented 8 years ago

Actually, after testing it a bit myself I notice a few things we probably 
should look at before releasing: 

1. The Preferences.java listener doesn't always fire, if it doesn't the 
provider isn't updated. Seems to work most of the time, but once in a while 
when I just sit there and change from provider to provider it stops firing. 
Odd. 

2. The OpenLibrary provider search seems to take a very long time. It takes, 
for example, on a Nexus One with 2.2, around 10 seconds to return results for 
the input string "test". The GoogleBooks provider search takes 1 second or so. 
Because of this, defaulting to "all" is very slow. 

3. The OpenLibrary provider doesn't return more than about the first page of 
results. Search for anything, then scroll down, it says no more results when 
there should be more data to retrieve. I don't know if this is an issue in the 
provider, or in the BookSearch code, but the GoogleBooks provider returns 
screen after screen after screen of results.

Original comment by charlie....@gmail.com on 2 Aug 2010 at 12:49

GoogleCodeExporter commented 8 years ago

As to number 3 -- It doesn't look like the OpenLibraryDataSource uses the 
startIndex, other than if it's zero. I'm not familiar with the OpenLibrary API 
as much, but must be some way to pass it the startIndex?

Original comment by charlie....@gmail.com on 2 Aug 2010 at 12:53

GoogleCodeExporter commented 8 years ago

As to issue 2 -- It looks like it makes one HTTP request to get the OpenLibrary 
IDs, and then another separate request for every one of the IDs (21 HTTP 
requests in all to get one page of search results, ouch). 

Again, I don't know the OpenLibrary API, and have to bail for tonight, but 
multiple HTTP requests for a page of search results is going to be slow as hell 
(and is, even just the overhead in HTTP itself, much less on a mobile device). 
We'll have to find some other way to do that.

Original comment by charlie....@gmail.com on 2 Aug 2010 at 1:07

GoogleCodeExporter commented 8 years ago

Looks like the OpenLibrary older JSON API is the one the provider is using, and 
that's deprecated. The newer REST API (which also uses JSON it looks like) can 
return "all properties" in one query, I think. 

http://openlibrary.org/dev/docs/restful_api#query

Original comment by charlie....@gmail.com on 2 Aug 2010 at 1:10

GoogleCodeExporter commented 8 years ago

Its actually worse than that as it has to make a request for author too.

Its using the books api http://openlibrary.org/dev/docs/api/books for single 
search when scanned, but the rest API doesn't support string queries and so was 
forced to use the JSON API. I have left a comment on a bug 
(https://bugs.launchpad.net/openlibrary/+bug/236947)

Unfortunately the JSON query doesn't support limits or start index and just 
returns the  first 20 matches :(

Original comment by iamsanch...@gmail.com on 2 Aug 2010 at 11:08

GoogleCodeExporter commented 8 years ago

Ok, I think we should probably keep the OpenLibrary provider right now, it will 
be in the next release, but just not make it the default (if people want to use 
it, they can, but it has a few known issues, slow, and only 20 results). We 
should continue to try to figure out how to use their APIs though to improve it 
(I couldn't figure it out from their docs last night, but has to be some way to 
do a general "search", not look for a particular data type, and to page the 
data.) 

Also, the Combo provider makes less sense to me as I think about it. Really it 
should only kick in to make a request to multiple providers if one returns no 
results for the term or is OUT of results. It shoulddn't use both at the same 
time, because the search page only gets 20 results and then pages to the next, 
etc. We maybe could make it smarter so user can set order preference and have 
fall back, but the way it is now it won't be all that useful. We should leave 
it in trunk, but add separate issues to iron it out.

Original comment by charlie....@gmail.com on 2 Aug 2010 at 1:03

GoogleCodeExporter commented 8 years ago

The idea behind the combo provider, is that not all data sources will return 
the same books, for a text search it checks all.

The idea of querying if one runs out is good, but if you could have the 
situation where for example google return 200 books none being the one you 
want, and until you've been through all 200, no other providers are checked.

Its worth having a longer think about :)

When scanning a bar code it does only search in a fallback way, if a book is 
found in the first provider, no more are called.

Original comment by iamsanch...@gmail.com on 2 Aug 2010 at 1:43

GoogleCodeExporter commented 8 years ago

Yeah that makes sense, I understand, but in that case it will need to bit a bit 
smarter, right? Just checking all of one, then checking all of another, and 
combining the unique results won't work, because of the paging in the search 
results. If both sources have 20 unique results, which do you use? 

We could just get 5 at a time from each source or something, but that's a bit 
random. We could sort them by relevance maybe (Levenshtein), but that would be 
complicated. 

When just dealing with one thing, like scanning, multiple sources makes sense, 
but it's a lot more complicated when you need multiple paged results (like 
search). We also maybe could separate the two functions (the OpenLibrary search 
being very useful when scanning, but at present less useful when searching). 

I agree it needs more thought.

Original comment by charlie....@gmail.com on 2 Aug 2010 at 2:41

GoogleCodeExporter commented 8 years ago

I am going to clean what we have up a bit, and then release with a few other 
minor bug fixes soon. I won't make the "multi" source the default though, 
because of the OpenLibrary limitations (it's slow as hell, and it can only get 
non-paged first set of results). 

Maybe later we can improve the OpenLibrary usage, and at least we'll have the 
architecture to handle the multiple sources in place because of this effort.

Original comment by charlie....@gmail.com on 29 Aug 2010 at 7:01

GoogleCodeExporter commented 8 years ago

I've just committed a new version of the OpenLibrary class into my branch. It 
uses a new method released by Open Library 
(https://bugs.launchpad.net/openlibrary/+bug/236947), its quicker and supports 
paging.

The updated class is based on the trunk version, so doesn't work with my 
branch, but works great with trunk.

Original comment by iamsanch...@gmail.com on 30 Aug 2010 at 12:08

GoogleCodeExporter commented 8 years ago

I've just noticed that if you select the CompoundDataSource and both sources 
return results the startIndex is the number of found books, not the next in 
sequence.

For example if the both have a limit of 10 and both return 10 books, the 
startIndex becomes 20.

Original comment by iamsanch...@gmail.com on 30 Aug 2010 at 12:19

GoogleCodeExporter commented 8 years ago

Thanks Simon, I'll check this out and try to get it released soon (just been 
ultra busy as of late).

Original comment by charlie....@gmail.com on 8 Sep 2010 at 2:44

GoogleCodeExporter commented 8 years ago

Original comment by charlie....@gmail.com on 19 Oct 2010 at 7:48

Changed state: Fixed

coolPrat / and-bookworm

Review simon branch (OpenLibrary and Composite data sources) #93