NLP Exercises: bag_of_n_grams and TF-IDF - Githubissues

codebasics / nlp-tutorials

Tutorials For Beginners For Natural Language Processing

222 stars 303 forks source link

NLP Exercises: bag_of_n_grams and TF-IDF #14

Closed KirandeepMarala closed 2 years ago

KirandeepMarala commented 2 years ago

Hi sir,

Thanks for the suggestions to improve the sentences. Have done a thorough Grammarly check and corrected the sentence formations for bag_of_n_grams_exercise.

Have added the exercises files for TF-IDF, looking forward to feedback and any improvements to make it more better.

Thank you!

dhavalsays commented 2 years ago

"etc. for text representation and apply difference classification algorithms. " Text representation is better over pre-processing

dhavalsays commented 2 years ago

I suggest we import the libraries only when we need it. So you should start with "About Data" first and then whenever you need a library import that in that cell. In jupyter notebooks it is better to keep libraries near to cells where they are being used.

dhavalsays commented 2 years ago

Can you remove "perfectly" ? They are balanced but I would not call it perfectly balanced unless sample count is exactly same for all classes.

dhavalsays commented 2 years ago

Can you use lower case? i.e. preprocessed_comment ? It is recommended to use snake case and that too with everything as small case for column names, variables etc.

dhavalsays commented 2 years ago

Text should be text. Start with a small case. One observation I have Kiran is your English needs improvement. For this, you can use the Grammarly extension. In a language, the first letter of any sentence should be capitalized and then everything should be small with some exceptions such as the person's name, I, location, and so on. Read this for complete set of rules: https://writer.com/blog/capitalization-rules/

dhavalsays commented 2 years ago

Another example of improper capitalization. Please have all your text run through Grammarly check and make necessary changes. Overall once again good job on the actual technical content 👍