Closed yashksaini-coder closed 1 week ago
👋 Thank you for opening this pull request! We're excited to review your contribution. Please give us a moment, and we'll get back to you shortly!
Feel free to join our community on Discord to discuss more!
✅ Closes: #436
This pull request includes significant updates to the
pysnippets/NLP
directory, introducing new NLP functionalities and enhancing existing ones. The changes include the addition of a comprehensive README file, new NLP processing functions, and corresponding unit tests.Documentation:
NLP.md
describing the NLP code snippets, their features, installation instructions, usage examples, and testing guidelines.New NLP Functions:
cosine_similarity_texts.py
: Added function to compute cosine similarity between two text documents usingCountVectorizer
andcosine_similarity
from sklearn.extract_entities.py
: Added function to perform Named Entity Recognition (NER) using spaCy.generate_text.py
: Added function to generate text based on a Markov chain model.lda_topic_modeling.py
: Added function for topic modeling using Latent Dirichlet Allocation (LDA) with sklearn.lemmatize_text.py
: Added function to lemmatize text using NLTK’s WordNet Lemmatizer.preprocess_text.py
: Added function to preprocess text by removing stopwords using NLTK.stem_text.py
: Added function to stem words in a text using NLTK’s Porter Stemmer.tfidf_vectorize.py
: Added function to vectorize text documents using TF-IDF with sklearn.tokenize_text.py
: Added function to tokenize text into words and sentences using NLTK.train_text_classifier.py
: Added function to train a Naive Bayes text classifier usingCountVectorizer
andMultinomialNB
from sklearn.word2vec_similarity.py
: Added function to compute word similarity using a pre-trained Word2Vec model from gensim.Testing:
test_all.py
: Added comprehensive unit tests for each NLP function using Python'sunittest
framework to ensure correctness and handle edge cases.TESTING
@UTSAVS26 can you review this PR, give at least level 3