ds300 / jetzt

Speed reader extension for chrome
Other
487 stars 124 forks source link

flash information acquisition time. #24

Open danjul opened 10 years ago

danjul commented 10 years ago

Hey there, I've noted a possible flaw with the system which is easily remedied. As a psychology student I've spent some time looking into how the brain acquires information at speed, it's been a personal interest of mine for some time.

I will equate the theory of how a flash card test with four symbols (words you know) can be thrown by the introduction of a new symbol (words you don't know). For example a square, circle, triangle and star might be used over and over in a memory test (just like you do with reading, you're ordering information and giving it meaning) up until the 30 minute mark.

In this test you are expected to regurgitate the sequence up to 6 symbols long. Most people will get very good at this by this point. Until you slip in a new symbol which is hard to quantify. (new word which you've not seen before) Usually people are thrown by the new 'information' and they completely forget the old information around it.

I believe the biggest flaw of the Spritz system is just this, it allows for no contextual assimilation time of new information.

As an example consider the attached text:

"Patrick's growing collection of enemies eventually became just too numerous and too powerful. He was in prison in Edinburgh when his son Robert rose in revolt in 1614 and seized Kirkwall. It took a royal army under the Earl of Caithness and a siege (during which Kirkwall Castle was destroyed and St Magnus Cathedral, Kirkwall, was threatened) to displace him, and both Patrick and his son Robert were later executed. It is an oft-quoted comment on Patrick's ignorance that his execution had to be delayed to give him time to learn the Lord's Prayer."

Reading this for the first time as someone who knows little of Scottish history, I might be thrown by the introduction of the place "Kirkwall" if this was ploughing by me at 600wpm I would not be able to assimilate this new information. I'd have even more trouble taking on board the Earls and the Cathedral mentioned after as I've never heard of them before this reading either.

Two things would help greatly to assimilate this new information, i. context ii. time

I would propose that after hitting the space bar the current point which has been "Spritz'd" if you will, will be displayed on screen below the main window in context. Attached is the proposed alteration.

image

Great work on this adaptation guys, it will certainly be of great use to many people with dyslexia and anyone looking to improve their reading speed!

Dan

h0ru5 commented 10 years ago

There is one integral problem to solve: right now, the text is extracted and processed. For this and other nice features (e.g. "resume from here"), you would need to map-back the current instruction (view-model of the speedreader-display) to the position:

  1. in the text (is given) and
  2. in the original document (not connected AFAICT)

When you know on what position in the DOM the current word in the speedreader view is, the above feature would be trivial.

Any thoughts?

danjul commented 10 years ago

Unfortunately I'm not much of a programmer, my first thought when I read about Spritz was "I'm going to knock that together in python in 10 minutes!", then I found OpenSpritz and then this project, which I liked the most and saw the most practicable future for.

Unfortunately my programming is limited to Python and some VB.

I hope although I have nothing to offer so far as coding goes with this project, my insights into psychology might be able to help you guys create a better interface. Like I said before, the subject of rapid-learning and data acquisition is something that interests me greatly.

Regards Dan

ds300 commented 10 years ago

You bring up some excellent points. I've noticed the same thing about new words. The problem is compounded when they are similar to well-known words. For example, in the Sherlock Holmes chapter, there is the word 'slavey'. I read this as 'slavery', then had to stop the reader to see what the word actually was because 'slavery' didn't make sense in context.

Being able to pause and highlight the current word and/or paragraph would be an extremely valuable feature, no doubt. @h0ru5 was right about the steps needed to solve this problem, and I've been considering the best way to go about it (see #13 for another use case).

This gave me an idea for how to add extra pauses for little-known words, and even more extra pauses for little-known words that look like well-known words. It might not be feasible with the amount of memory chrome extensions are allowed to consume, but I'll look into it for sure.

Food for thought. Thanks Dan!

danjul commented 10 years ago

I agree with you fully about words which look like others, that's definitely a problem. Reading at speeds at the outside limit of your ability you will be identifying words mostly by their size and what letters you pick up on it containing, which could certainly lead you astray.

You raise an interesting concern about homophones (words which look similar) however I believe that most people will know which word they expect to read in context if you follow me, your example above was one that tripped me up as well when I was going through the text.

I would be happy to compile a list of homophones should you decide you wish to go down that route though.

Partial list of homophones http://stu.westga.edu/~lgoodno1/eport/lessonplans/List%20of%20English%20Homophones.htm

ds300 commented 10 years ago

Homophones are words which sound similar. They can look quite different, event at flash speeds, e.g. there, their, they're. And certainly the surrounding context will usually prevent miscomprehension, as you say.

I think trouble mainly arises when the words are un-/seldom seen for the reader.

danjul commented 10 years ago

You're right, sorry I had been doing some reading up last night on homophonic heterographs and heterographic homophones (specific problems with surface dyslexia) and had a bit of a brain-fart regarding definition. I'm not sure there's a word for 'A word that has most of the same letters but has a different meaning to another' - here's your chance to get something into the Oxford Dictionary! ;)

A list could be compiled of common words fairly easily, I'm sure someone took a dump of Wikipedia a few years back and listed the prevalence of words in a massive file. If the top 10,000 words are identified as "usual" and anything not in the list as "not so usual" then that could require a longer delay before the next word is displayed I guess. Might make for an interesting on/off feature.

Also capitalised words, e.g. names, places could be fairly easily identified by their use of a capital letter mid-sentence but I reckon a review method to see the word in context might still prove the best option.

peteruithoven commented 10 years ago

My thoughts on highlighting the current word: Even though it is much harder, I think showing the reading position in the original context (website) is preferred. Especially if you think about tables, code examples etc. If we keep the select words to spritz feature (besides the select elements to spritz) there is no way we can store information about dom elements while parsing. This probably means we would need to do a text search on pause. One tricky thing is that a a single word can be found multiple times, but searching for multiple words can be tricky when words are split over multiple elements (a list (ul) of single words for example). A first suggestion: Search for the current token (word(part)). If found only once display, else extend the search with 1 extra token on both sides, if found once display else extend etc. Extending a search with a extra earlier token would mean; from the target word, check if there is another word earlier, if so match, else check the previous element (recursively).

On pausing longer on certain words: I would try to prevent working with status lists of words, because that easily limits you to English text or gives you another configuration to worry about.

(Maybe these two enhancements should be split?)

h0ru5 commented 10 years ago

Yes, that might be two trains of thought here:

Regarding the point "giving context on pause": the low hanging fruit could be to display a <div> with the text of the current sentence or paragraph and the current token highlighted in it.

As for highlighting context on the page: the dom parser could add a reference to the corresponding dom node to each instruction, but the mentioned problems remain to find the word inside the dom node and the text-only (e.g. selected text) scenario.

Regarding "longer pause on uncommon words": finding out the language of a page is feasible (chrome offers an offline api for that), but even then keeping a score list on how uncommon a word is could be quite a task...

ecsplendid commented 10 years ago

@peteruithoven I should have read your comment before I talked about finding the context in #38. I think something along the lines you suggested will work.

I think we should create an array with all text nodes using document.createTreeWalker(root,NodeFilter.SHOW_TEXT,null,false)

then walk that array, and then .nodeValue.split(" ") to get list of words, replace the text node with a DIV, then for each word in the array add a span in the div with the correct .nodeValue, make a linked list structure by having a property pointing to the previous word that appeared in our algorithm. Have a global hashtable to find all nodes with a corresponding word (stripping off anything that isn't /w+/) which will allow us to find the candidate word, then use the linked list structure to find the correct one.

I will make a test case when I have time

ecsplendid commented 10 years ago

God I hate DHTML

http://jsfiddle.net/ZYV3A/1/

I made a test case. I turn all the words into spans, but then when I try to set a class to "highlight" it doesn't seem to be working

ecsplendid commented 10 years ago

I forgot to mention about the comments from @danjul -- I was thinking that we assume there is a dependency between the length of the word and the comprehension time as a starting point (EDIT- @ds300 already does this, I hadn't realised at the time). We can also be more clever, so a complicated word can also increase the delay on a few of the next words decreasing as a square of distance/progression from the complicated word. And this delay could be additive in some sense. So in effect we are keeping track of how much complexity the reader has had to deal with recently and compensate accordingly.

If the word is in the "most common 5000 words" i.e. http://www.englishclub.com/vocabulary/common-words-5000.htm we can assume its a low complexity.

We can use other features (names, places, emphasis, headings)

peteruithoven commented 10 years ago

@ecsplendid nice experiment!

Should we move the delay on certain words or word length to a separate issue?

ecsplendid commented 10 years ago

Up to you :>

ecsplendid commented 10 years ago

We are cooking on gas now!!!

http://jsfiddle.net/ZYV3A/3/

ecsplendid commented 10 years ago

OK I have a fully working prototype of the text-find functionality.

http://jsfiddle.net/ZYV3A/5/

It will turn all words into SPANs and keep a linked list structure of how they are connected. All you need to do to highlight a word is have a few prior words in the sequence (any amount) to disambiguate the word you are looking for.

For example I called it like this:

HighlightWord( ["policies", "and" ,"redundancies" ,"they" ,"believed" ,"were"] );

The highlighted word is also scrolled into view.

ecsplendid commented 10 years ago

I have partially implemented this now -- see #38

ecsplendid commented 10 years ago

On https://github.com/ecsplendid/jetzt/tree/HighlightCurrentWord I import the top 1000 british words in ascending order of usage and reduce the wait time by up to 20% as a linear function of its usage. It also has to be less than 6 chars. And I remove punctuation from getpivot

peteruithoven commented 10 years ago

I'm responding here on the showing context related aspects that are also discussed in: https://github.com/ds300/jetzt/pull/38

@ecsplendid, I think, if we only highlight the current word on pause, we can just put spans around the word that is currently selected, we don't need to add spans around every word. I think this is what h0ru5 suggests here: @ds300 means with "avoidable". This is also what https://github.com/ds300/jetzt/pull/38#issuecomment-37415761

I understand that we all would like to show the current position in the website (even though a second window is easier)?

ecsplendid commented 10 years ago

Hi @peteruithoven :)

Well I might be going out on a limb here, but I really like the concept of seeing a real time highlight update in the underlying web page. I think the context experience is one that's better in real-time. The counter arguments is that it's computationally slow, visually messy i.e. damaging the style of the underlying web site or just visually distracting.

I think the real time idea adds value. Opinions?

My current vision for the context is something like this (similar to the image posted by @danjul)

image

On my first hash of this (see https://www.youtube.com/watch?v=jebPFbVdmTw) I turned all words into SPANS. This upset the style of the underlying web page.

So the new idea is that the we float a div over the paragraph currently being read. I think also that the jetzt reader should be "visually coupled" to this paragraph in some way but in such a way that the jetzt control maintains a fixed viewport position. When we move to the next paragraph we can do a fancy non linear translate/scale transform on the paragraph overlay. We inject into this div all the words and do the real time highlighting. This confers several benefits. 1) it's computationally simple 2) it doesn't manipulate the underlying web page at all 3) it gives beautiful context

Discuss :)