bgins57 / pwp-capstones

0 stars 0 forks source link

Project Review #1

Open jordanwillis opened 6 years ago

jordanwillis commented 6 years ago

Rubric Score

Criteria 1: Valid Python Code

Criteria 2: Implementation of Project Requirements

  1. get_average_sentence_length

It looks like you are a bit off on the results that I used to test this function out. I wasn't able to figure out the reason for the difference but I am guessing it has something to do with the 2nd to last line of code in your implementation. You are correct that we end up with something extra because of the period, but I think that impacts the number of sentences instead of the number of words. Anyway, here is the code I used for the test output along with the results I get with my implementation.


print(get_average_sentence_length("Have a nice day. Out of the three suspects, I think Gregg is the murderer? Any catlover must be a murderer!"))
print("-------------")
print(get_average_sentence_length(murder_note))
print(get_average_sentence_length(lily_trebuchet_intro))
print(get_average_sentence_length(myrtle_beech_intro))
print(get_average_sentence_length(gregg_t_fishy_intro))
7.0
-------------
22.066666666666666
15.05
6.75
19.363636363636363

Also, here is how I dealt with the extra sentence issue (e.g. using filter).

def get_average_sentence_length(text):
    no_questions = text.replace('?', '.')
    no_exclaim = no_questions.replace('!', '.')
    sentences = list(filter(None, no_exclaim.split('.')))
    sentence_lengths = []
    for sentence in sentences:
        words = sentence.split()
        sentence_lengths.append(len(words))

    average = sum(sentence_lengths) / len(sentences)
    return average
  1. frequency_comparison

When you are checking your 2nd table, don't forget that we need to use the value of that key for our number of appearances.

If the key doesn't exist in table1, the value for the key in table2 should be added to appearances.

It looks like you are only looking at the key. Here is the test code that I used with the correct results for you to compare with later.

mytable1 = {"a":2,"b":2,"c":2}
mytable2 = {"c":4,"d":2, "a":4, "e":2}
print(str(frequency_comparison(mytable1,mytable2)))
test1 = {"key1":1, "key2":2, "key3":3, "key5": 5}
test2 = {"key1":2, "key2":1, "key4":7, "key5":10}
print(frequency_comparison(test1, test2))
0.2857142857142857
0.2916666666666667

Also, I thought I would provide my implementation as well for learning purposes.

def frequency_comparison2( table1, table2):
    appearances = 0
    mutual_appearances = 0

    for key1 in table1.keys():
        if key1 in table2.keys():
            mutual_appearances += min(table1[key1],table2[key1])
            appearances += max(table1[key1],table2[key1])
        elif key1 not in table2.keys():
            appearances += table1[key1]

    for key2 in table2.keys():
        if key2 not in table1.keys() :
            appearances += table2[key2]

    result = mutual_appearances/appearances

    return result
  1. percent_difference

I would recommend implementing this function assuming that the passed in values for just numbers, and then access your average sentence length from each TextSample when you call the function. Obviously either approach produces the same result, but in general its better to implement functions that are more generic and don't require knowledge of what is being passed in.

Here is an example of what I mean.

def percent_difference(value1, value2):
    numerator = abs(value1 - value2)
    denominator = (value1 + value2) / 2
    return numerator / denominator

sentence_length_difference = percent_difference(sample1.average_sentence_length, sample2.average_sentence_length)

Everything else from a project requirements perspective looks great! Be sure to checkout the below review section (Criteria 4) for some recommendations and feedback on how you might further improve your Python skills

Criteria 3: Software Architecture

Criteria 4: Uses Python Language Features

  1. Great job using triple quote strings! They offer more safety and flexibility in that they allow for multi-line strings.

  2. You could shorten your prepare_text method some by taking advantage of the join method like this.

def prepare_text(text):
    lower_text = text.lower()
    punctuation = ".,!?"
    no_punctuation = ''.join(letter for letter in lower_text if letter not in punctuation)
    return no_punctuation.split()
  1. Your build_frequency_table seemed quite overly complex with all the looping going on. I think you may have just over thought this one, but consider this more straightforward implementation.
def build_frequency_table(corpus):
    word_count_frequency = {}

    for word in corpus:
        if word_count_frequency.get(word):
            word_count_frequency[word] += 1

        else:
            word_count_frequency[word] = 1

    return word_count_frequency

We just iterate through the words and either add the unique word to our collection or increment the count.

Criteria 5: Produces Accurate Output

Overall Score: 19/20

Overall you did a fantastic job on this project! I can tell that you have learned a lot from the PWP course and hope you continue on with your Python journey.

Keep up the great work!

bgins57 commented 6 years ago

I appreciate that feedback. Seems I quickly fall back into old habits and don't use the full capability of Python, so you code suggestions are really helpful.