TUM-IDP-WS-20 / doc

0 stars 0 forks source link

Document how CountVectorizer works #22

Open farukcankaya opened 3 years ago

farukcankaya commented 3 years ago

We are using CountVectorizer from from sklearn.feature_extraction.text in Milestone_1_W_Relevant_Data and Milestone_1 like below:

count_vectorizer = CountVectorizer(analyzer='word',       
                             min_df=3,                       
                             stop_words='english',             
                             lowercase=True,                   
                             token_pattern='[a-zA-Z0-9]{3,}',  
                             max_features=5000,          
                            )