Japan-Digital-Archives / Japan-Digital-Archive

Digital Archive of Japan's 2011 Disasters
6 stars 18 forks source link

Search Japanese documents #156

Open kmlawson opened 12 years ago

kmlawson commented 12 years ago

searching for Japanese text such as スタート which is in the sub-title in this document (search for: 宮城県南三陸町/歌津 ) seems to show the search query is empty

-however, I just noticed that the search query also appears empty for searching on an English document too...

hmm, possible to deactivate search in Documents until we work out the DocumentCloud kinks?

jshapins commented 12 years ago

Strange. Searching for 宮城県南三陸町 does seem to return the one result.

kmlawson commented 12 years ago

Ah, I'm sorry, I wasn't very clear: when you open the document 宮城県南三陸町 there is a search box at top to search inside this document. However, I realize now this is a bad document to use as an example, since even though the PDF has a text layer that theoretically should be searchable, its encoding is very screwy and I can't search it even after downloaded.

Instead see the example of document: "Miyagi University Report on Damages to Shizugawa area of Minami Sanriku"

If you open this document, search for "東北" which will return 0 hits.

Now download the original document here: http://s3.documentcloud.org/documents/281897/sample.pdf

and open in Apple preview or something and do the same 東北 search. Document cloud does not yet handle Japanese search of text.

How about we set the "visible" of the search box to false until we can work out Japanese search?

Xpath: //*[@id="fancybox-document-cloud"]/div/div[1]/div[1]/div[2]/div[2]/div[1]/form/div/input

input class="DV-searchInput"

jshapins commented 12 years ago

Gotcha. Yup. Exactly. DocumentCloud does not support Japanese text. Will look into disabling the internal document search box for Japanese texts.

On Sat, Jun 23, 2012 at 10:11 AM, K. M. Lawson < reply@reply.github.com

wrote:

Ah, I'm sorry, I wasn't very clear: when you open the document $B5>k8)Fn;0N&D.(B there is a search box at top to search inside this document. However, I realize now this is a bad document to use as an example, since even though the PDF has a text layer that theoretically should be searchable, its encoding is very screwy and I can't search it even after downloaded.

Instead see the example of document: "Miyagi University Report on Damages to Shizugawa area of Minami Sanriku"

If you open this document, search for "$BElKL(B" which will return 0 hits.

Now download the original document here: http://s3.documentcloud.org/documents/281897/sample.pdf

and open in Apple preview or something and do the same $BElKL(B search. Document cloud does not yet handle Japanese search of text.

How about we set the "visible" of the search box to false until we can work out Japanese search?

Xpath:

//*[@id="fancybox-document-cloud"]/div/div[1]/div[1]/div[2]/div[2]/div[1]/form/div/input

input class="DV-searchInput"


Reply to this email directly or view it on GitHub:

https://github.com/Zeega/Japan-Digital-Archive/issues/156#issuecomment-6525219

kmlawson commented 12 years ago

Thanks!