TeamGleason / SwipeSpeak

MIT License
3 stars 3 forks source link

Consider changing prediction dictionary #6

Closed TeamGleasonDev closed 6 years ago

TeamGleasonDev commented 6 years ago

Unable to write "this is a test" using default 4-key layout.

Single swipe on 4 keys keyboard was unable to get a word prediction for 'is'.

Might also investigate the prediction engine / word frequency dictionary given that 'is' should be a very common word.

TeamGleasonDev commented 6 years ago

Was also unable to write 'is' or 'hi Chris' with the six key layout.

The six key double swipe layout works for all these scenarios.

TeamGleasonDev commented 6 years ago

Here is the word list:

https://github.com/TeamGleason/SwipeSpeak/blob/master/SwipeSpeak/SwipeSpeak/WordList.csv

"is" is completely missing ?!?

"hi" is #910 in the list.

TeamGleasonDev commented 6 years ago

Perhaps replace with:

http://norvig.com/ngrams/count_1w.txt

From:

http://norvig.com/ngrams/

But very tech heavy.

TeamGleasonDev commented 6 years ago

More ideas:

http://ucrel.lancs.ac.uk/bncfreq/

http://martinweisser.org/corpora_site/word_lists.html

danieltskv commented 6 years ago

I will review the proposed word lists and select a new one.

zxybdfz commented 6 years ago

The original free word list is from http://www.kilgarriff.co.uk/bnc-readme.html-> http://www.kilgarriff.co.uk/BNClists/lemma.num, suggested by Mark Davies(who created lots of corpus). His other suggestion is http://ucrel.lancs.ac.uk/bncfreq/lists/2_2_spokenvwritten.txt Please let me know if you have any concern to potential word lists. Once we choose a new wordlist, I can also help to clean up it.

danieltskv commented 6 years ago

@JayBeavers @zxybdfz Idea: We can include a default file with the app's bundle, but use Firebase's Cloud Storage to also store the file and change it if we like. We can then ping the server and if we detect changes we download the new file and use it. This way we will always have a word list that is up to date and remotely changeable. Thoughts? https://firebase.google.com/docs/storage/

danieltskv commented 6 years ago

@zxybdfz There is an issue when using word lists with words that contain special characters such as ', ~, *.

I've incorpored the http://ucrel.lancs.ac.uk/bncfreq/lists/2_2_spokenvwritten.txt word list into the project and we get a crash when we reach a word with a special charecter in the WordPredictionEngine class.

I can try to fix it by myself, but as you worked on it and know this class better, I thought about asking you if you have a good solution.

You can download the current code in the develop branch and try it. Just switch the word_frequency_english_kilgarriff to word_frequency_english_ucrel in line 109 in MainTVC.

https://github.com/TeamGleason/SwipeSpeak/tree/0578d88dff68e30970954449395f140d1a7deb7c

Thanks!

TeamGleasonDev commented 6 years ago

Sure, that's fine. Do we need it for initial release?

After initial release, we should get into custom dictionaries too, collect words spoken. This should be an opt in, defaulting to yes. We should also personalize this. If you want I can stand up an instance of the GarageHop login service I use which is based on a simple email validation system. I'll need a docker container to do this, will look into Google's docker hosting.

On Wed, Dec 20, 2017, 2:14 AM Daniel Tsirulnikov notifications@github.com wrote:

@JayBeavers https://github.com/jaybeavers @zxybdfz https://github.com/zxybdfz Idea: We can include a default file with the app's bundle, but use Firebase's Cloud Storage to also store the file and change it if we like. We can then ping the server and if we detect changes we download the new file and use it. This way we will always have a word list that is up to date and remotely changeable. Thoughts? https://firebase.google.com/docs/storage/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/TeamGleason/SwipeSpeak/issues/6#issuecomment-353022691, or mute the thread https://github.com/notifications/unsubscribe-auth/Ab8rCBL4CoEjth3klgBfZTNsgjS0-bKzks5tCN4PgaJpZM4Q_S8s .

danieltskv commented 6 years ago

@TeamGleasonDev @JayBeavers I see, yes, we don't need it for initial release. We'll work on it later. Also, we could use the Firebase's online database to login and store user data. Fast and easy (and it has an email validation system). But, if you prefer our own service we could do that too.

danieltskv commented 6 years ago

I've switched the wordlist to this one: http://ucrel.lancs.ac.uk/bncfreq/lists/2_2_spokenvwritten.txt (with additional cleaning). From initial test it seems to work fine. Let me know of any issues.