Please investigate search user interfaces, with a particular focus on how diverse types of items are returned and displayed from the following sites (plus any that you think of that are good ideas). Let's document, with commentary on what is unique or seems particularly good or useful, by grabbing screenshots and URLs of examples. Lets put this in a google doc, rather than an issue, for now.
Smithsonian's Archives of American Art
Digital Florentine Codex at the Getty
Transkribus Sites
Google Book Search
Google Art & Culture
Google itself
the BSB (bavarian state biblioteck. In german.)
Europeana
There's some background/implementation thinking about search that might be useful to have in mind (or not!). These are my notes on how we'll probably implement search for FromThePage. Looking for parallels in the above sites would be useful.
Types of objects returned by search: organization, collection, work, snippet/page, tags
Scope each search: Find a Project, Org Page, Collection Page, Work Page
For each scoped search, “boost” base on what people want to find. Boosting is how you prioritize the obvious thinking they are looking for. We could facet on type of returned object – but very simple (copy Google, not a library) – put it on the top with taggy buttons, not a facet.
You can boost different things – i.e. “org name” x 10; “collection name” x 5; for any given scope.
Find a Project: boost organization names, then collections
Org Page: boost collection names
Collection: hmm?? Pages or works?
Work: page text/snippets may be the only type
Search dbs basically “flatten” the info you want to search on. So we’d define the info for an org: name, description, url – which is flattened (I think)
Data per object type:
org: name, description, url
Collection: name, description, tag (metadata??)
Work: name, description, metadata
Page: name, transcription/other text
Snippet: phase 2? tied to pages if we have bounding boxes
Unlike a database, where empty fields take up space, they don’t in these search dbs. Makes it cheaper.
The biggest challenge we’ll run into may be keeping it in sync.
TODO: research ruby libraries for sending object data to elasticsearch or solr, etc. and keeping it in sync.
Do an analysis of searches over the last month. From where? What do we think they were trying to find?
Come up with some test searches – i.e. the Getty always had to return the parking info from the website on a “Parking” search, even though they had images of parking lots and historical institutional docs about the parking lot.
Please investigate search user interfaces, with a particular focus on how diverse types of items are returned and displayed from the following sites (plus any that you think of that are good ideas). Let's document, with commentary on what is unique or seems particularly good or useful, by grabbing screenshots and URLs of examples. Lets put this in a google doc, rather than an issue, for now.
Smithsonian's Archives of American Art Digital Florentine Codex at the Getty Transkribus Sites Google Book Search Google Art & Culture Google itself the BSB (bavarian state biblioteck. In german.) Europeana
There's some background/implementation thinking about search that might be useful to have in mind (or not!). These are my notes on how we'll probably implement search for FromThePage. Looking for parallels in the above sites would be useful.
Types of objects returned by search: organization, collection, work, snippet/page, tags
Scope each search: Find a Project, Org Page, Collection Page, Work Page
For each scoped search, “boost” base on what people want to find. Boosting is how you prioritize the obvious thinking they are looking for. We could facet on type of returned object – but very simple (copy Google, not a library) – put it on the top with taggy buttons, not a facet.
You can boost different things – i.e. “org name” x 10; “collection name” x 5; for any given scope.
Find a Project: boost organization names, then collections
Org Page: boost collection names
Collection: hmm?? Pages or works?
Work: page text/snippets may be the only type
Search dbs basically “flatten” the info you want to search on. So we’d define the info for an org: name, description, url – which is flattened (I think)
Data per object type:
org: name, description, url
Collection: name, description, tag (metadata??)
Work: name, description, metadata
Page: name, transcription/other text
Snippet: phase 2? tied to pages if we have bounding boxes
Unlike a database, where empty fields take up space, they don’t in these search dbs. Makes it cheaper.
The biggest challenge we’ll run into may be keeping it in sync.
TODO: research ruby libraries for sending object data to elasticsearch or solr, etc. and keeping it in sync.
Do an analysis of searches over the last month. From where? What do we think they were trying to find?
Come up with some test searches – i.e. the Getty always had to return the parking info from the website on a “Parking” search, even though they had images of parking lots and historical institutional docs about the parking lot.
Pay attention to scope.
TODO: this would be a great intern task