Learning-Python-Team / word-game-tool

Basic tool to help players of word games, based around Scrabble, find the score of their words and assist them selecting the optimal word
MIT License
7 stars 8 forks source link

Regex not covering all cases #7

Closed JiggsUK closed 5 years ago

JiggsUK commented 5 years ago

When you run the program, try entering theses words:

àbbey hello_world hig rise

What happens? What could we do to fix this?

Just a quick note: webdotorg and saiyencoder are 6 hours behind the rest of you so try and make sure you allow them to have some input if you can

asa-holland commented 5 years ago

Okay! I created a new branch for us to work in.

I don't know regex super well, but the current one searches the user input for [\d\W] and if it finds any matches, it spits out the error message.

\d matches anything that's a digit \W matches anything that is NOT a digit or a letter

Thoughts on how to ignore (not match) the space and underscore characters?

I didn't realize \W would not match to à! Learned something new today.

webdotorg commented 5 years ago

Here's a fun cheat sheet. https://www.rexegg.com/regex-quickstart.html I assume we want to ignore all non-standard English characters, i.e., A through Z.

webdotorg commented 5 years ago

would something like this work?

word = 'hello_world'

if not word.isalpha(): print("Enter a word. Letters only.")

JiggsUK commented 5 years ago

Good shout, have you looked at which letters are included in an .isalpha search? Would the 3 words on the issue pass?

asa-holland commented 5 years ago

And now I know what 'good shout' means!

We could use the ^ key inside the regex pattern to limit matches to standard English letters.

Two side questions:

Should we allow a user to use a space or underscore character to represent a blank tile? Some word games have those.

Should we be checking that what the user entered is indeed a word? For example, if the user enters definately instead of 'definitely' (or intentionally types gibberish) should we still display the score, or an error message saying that their input is not a word?

webdotorg commented 5 years ago

Okay... Something like this passes everything except special characters.

Should pass with: 1) hello_world 2) hig rise

Should not pass with:

1) àbbey

lstrip('') removes spaces

.isalpha() ensures that all characters are A through Z.

word_3 = 'hig rise'

word_3.lstrip('') if not word_3.isalpha(): print("Enter a word. You cannot use numbers or spaces.")

JiggsUK commented 5 years ago

Should we allow a user to use a space or underscore character to represent a blank tile? Some word games have those.

As MrDayKwan suggests, we might want to consider changing what the blank character is in the dictionary. A space is hard to account for, but a symbol would be easier.

.isalpha() ensures that all characters are A through Z.

It's not just english alphabet, it checks latin characters too. àbbey fails because the à in not in our character dictionary. From the regex docs: image

Perhaps we need a more specific regex, after we have decided what should represent a blank tile.

Should we be checking that what the user entered is indeed a word? For example, if the user enters definately instead of 'definitely' (or intentionally types gibberish) should we still display the score, or an error message saying that their input is not a word?

This would be a great development option, but I think it is too much for right now.

asa-holland commented 5 years ago

I would put in a vote for allowing _or * as a user substitution for a blank tile. What do you guys think?


This would be a great development option, but I think it is too much for right now.

Hehe, I'm too stubborn for that! Does the following stuff make sense? The first block of code could be put at the start of the file, and the elif statement inside the primary while loop near the end.

# open the .txt file of official scrabble words from 2015 using the 'read' status
base_word_text_file = open("Collins Scrabble Words (2015).txt", "r")
# check to make sure the file mode is read, then take each row from the text file and make it a list item
# using split on new lines to separate the text on each row, read the text, then add it to the base_word_list list
if base_word_text_file.mode == 'r':
    base_word_list = base_word_text_file.read().split('\n')
    base_word_text_file.close()
# elif statement to validate that input is a correctly spelled English word
    # We use the base_word_list created from our .txt file
    elif user_input.upper() not in base_word_list:
        print(f'Sorry, {user_input} may be spelled incorrectly. Try again.')
nakulkd commented 5 years ago

@shyamcody has written a regex bit using re.compile('[A-Za-z]') to check the input string. This seems to be working effectively for the use cases highlighted here. Just a heads-up for the group.

JiggsUK commented 5 years ago

Nice @MrDayKwan! If you want to push it to the regex issue branch. I think we will also find it much easier to deal with if we put the .upper() on the end of the user input variable. Then we won't need to remember to put it to upper everytime we want to use it.

Edit: nakulkd beat me to it but here's the full code suggestion from @shyamcody:

import re

character_regex = re.compile('[a-zA-Z]')

def check_diff_characters(user_input):
    if character_regex.search(user_input):
        return True
    else:
        return False

What do you think? I think it'll work great, we just need to add an extra bit for the blank tile - which I would vote for a * as the value

shyamcody commented 5 years ago

sorry for the mishap; but I sent a wrong file to Jiggsuk. The original patch of code is:

import re character_regex=re.compile('[a-zA-Z]') def check_diff_characters(user_input): extra_characters=character_regex.sub(r'',user_input) if len(extra_characters)==0: return False else: return True

asa-holland commented 5 years ago

Alright, using @shyamcody 's regex and modifying it to include * should do the trick! It's in the main.py file now. Also added a block to account for * characters in the validation for 'is user input actually a word'.

JiggsUK commented 5 years ago

OK, so it looks like we have got there with this issue. The scenarios at the top all pass - or rather catch an exception as they should - I believe the issue is resolved so I will close this. Well done everyone, thank you for your input!