HollowPrincess / PracticeSurveys

1 stars 0 forks source link

Подумать над извлечением ключевых слов и словосочетаний в тексте #4

Closed HollowPrincess closed 5 years ago

HollowPrincess commented 5 years ago

https://towardsdatascience.com/hacking-scikit-learns-vectorizers-9ef26a7170af

HollowPrincess commented 5 years ago

https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer

HollowPrincess commented 5 years ago

https://scikit-learn.org/stable/auto_examples/model_selection/grid_search_text_feature_extraction.html#sphx-glr-auto-examples-model-selection-grid-search-text-feature-extraction-py

HollowPrincess commented 5 years ago

https://scikit-learn.org/stable/auto_examples/applications/plot_topics_extraction_with_nmf_lda.html#sphx-glr-auto-examples-applications-plot-topics-extraction-with-nmf-lda-py

HollowPrincess commented 5 years ago

https://m.habr.com/ru/company/mailru/blog/358736/

HollowPrincess commented 5 years ago

Стемминг и лемматизация https://www.nltk.org/_modules/nltk/stem/snowball.html https://stackoverflow.com/questions/36182502/add-stemming-support-to-countvectorizer-sklearn https://www.programcreek.com/python/example/91271/nltk.stem

Обработка опечаток ? Нужно ли при 3-граммах? https://habr.com/ru/post/346618/

HollowPrincess commented 5 years ago

Бринк с.208

HollowPrincess commented 5 years ago

https://github.com/RisaMagpie/PracticeSurveys/commit/5ab1fff771c28dd1c2d144f5d72fb499d1433eab https://github.com/RisaMagpie/PracticeSurveys/commit/a02f47d23b38369f6ae97811f0d65ed488aa1102

HollowPrincess commented 5 years ago

https://github.com/RisaMagpie/PracticeSurveys/commit/4d069af93e6a6a7491f88d4a48f3afb8c24505a0