Project Review - Githubissues

Rubric Score

Criteria 1: Valid Python Code

Score Level: 4/4
Comment(s): Great job writing all valid Python code without any errors or issues! I tested each one of your functions and class and in all cases received valid output and no translation errors.

Criteria 2: Implementation of Project Requirements

Score Level: 3/4
Comment(s): Nice job implementing each of the required set of functions and class for this project! Hopefully this programming "game" highlighted some of the ways that you might use Python for data analysis and discovery. I did want to give some feedback on a few things listed below.

get_average_sentence_length

It looks like you are a bit off on the results that I used to test this function out. I wasn't able to figure out the reason for the difference but I am guessing it has something to do with the 2nd to last line of code in your implementation. You are correct that we end up with something extra because of the period, but I think that impacts the number of sentences instead of the number of words. Anyway, here is the code I used for the test output along with the results I get with my implementation.


print(get_average_sentence_length("Have a nice day. Out of the three suspects, I think Gregg is the murderer? Any catlover must be a murderer!"))
print("-------------")
print(get_average_sentence_length(murder_note))
print(get_average_sentence_length(lily_trebuchet_intro))
print(get_average_sentence_length(myrtle_beech_intro))
print(get_average_sentence_length(gregg_t_fishy_intro))

7.0
-------------
22.066666666666666
15.05
6.75
19.363636363636363

Also, here is how I dealt with the extra sentence issue (e.g. using filter).

def get_average_sentence_length(text):
    no_questions = text.replace('?', '.')
    no_exclaim = no_questions.replace('!', '.')
    sentences = list(filter(None, no_exclaim.split('.')))
    sentence_lengths = []
    for sentence in sentences:
        words = sentence.split()
        sentence_lengths.append(len(words))

    average = sum(sentence_lengths) / len(sentences)
    return average

frequency_comparison

When you are checking your 2nd table, don't forget that we need to use the value of that key for our number of appearances.

If the key doesn't exist in table1, the value for the key in table2 should be added to appearances.

It looks like you are only looking at the key. Here is the test code that I used with the correct results for you to compare with later.

mytable1 = {"a":2,"b":2,"c":2}
mytable2 = {"c":4,"d":2, "a":4, "e":2}
print(str(frequency_comparison(mytable1,mytable2)))
test1 = {"key1":1, "key2":2, "key3":3, "key5": 5}
test2 = {"key1":2, "key2":1, "key4":7, "key5":10}
print(frequency_comparison(test1, test2))

0.2857142857142857
0.2916666666666667

Also, I thought I would provide my implementation as well for learning purposes.

def frequency_comparison2( table1, table2):
    appearances = 0
    mutual_appearances = 0

    for key1 in table1.keys():
        if key1 in table2.keys():
            mutual_appearances += min(table1[key1],table2[key1])
            appearances += max(table1[key1],table2[key1])
        elif key1 not in table2.keys():
            appearances += table1[key1]

    for key2 in table2.keys():
        if key2 not in table1.keys() :
            appearances += table2[key2]

    result = mutual_appearances/appearances

    return result

percent_difference

I would recommend implementing this function assuming that the passed in values for just numbers, and then access your average sentence length from each TextSample when you call the function. Obviously either approach produces the same result, but in general its better to implement functions that are more generic and don't require knowledge of what is being passed in.

Here is an example of what I mean.

def percent_difference(value1, value2):
    numerator = abs(value1 - value2)
    denominator = (value1 + value2) / 2
    return numerator / denominator

sentence_length_difference = percent_difference(sample1.average_sentence_length, sample2.average_sentence_length)

Everything else from a project requirements perspective looks great! Be sure to checkout the below review section (Criteria 4) for some recommendations and feedback on how you might further improve your Python skills

Criteria 3: Software Architecture

Score Level: 4/4
Comment(s): Though the project requirements instructed you to break apart your implementation into separate functions and classes, I'm glad to see that you followed through on this requirement! Hopefully this basic software architectural approach provided you with some guidance on how to structure code in the future. Imagine if you had been asked to solve this problem in one fail swoop, it would certainly have been much more challenging to do. As a result your implementation would have most likely been much less extensible, maintainable, and verifiable (e.g. the ability to test small chunks of functionality). Please keep this in mind as you continue using Python in the future!

Criteria 4: Uses Python Language Features

Score Level: 4/4
Comment(s): Very nice job taking advantage of many built-in language functions and features! Since there is always room for improvement, I have shared a few recommendations and feedback on your implementation below.

Great job using triple quote strings! They offer more safety and flexibility in that they allow for multi-line strings.
You could shorten your prepare_text method some by taking advantage of the join method like this.

def prepare_text(text):
    lower_text = text.lower()
    punctuation = ".,!?"
    no_punctuation = ''.join(letter for letter in lower_text if letter not in punctuation)
    return no_punctuation.split()

Your build_frequency_table seemed quite overly complex with all the looping going on. I think you may have just over thought this one, but consider this more straightforward implementation.

def build_frequency_table(corpus):
    word_count_frequency = {}

    for word in corpus:
        if word_count_frequency.get(word):
            word_count_frequency[word] += 1

        else:
            word_count_frequency[word] = 1

    return word_count_frequency

We just iterate through the words and either add the unique word to our collection or increment the count.

Criteria 5: Produces Accurate Output

Score Level: 4/4
Comment(s): Great job correctly identifying the murderer! Despite some of the minor issues above, the overall impact to your code was marginal.

Overall Score: 19/20

Overall you did a fantastic job on this project! I can tell that you have learned a lot from the PWP course and hope you continue on with your Python journey.

Keep up the great work!

bgins57 / pwp-capstones

Project Review #1