ddangelov / Top2Vec

Top2Vec learns jointly embedded topic, document and word vectors.
BSD 3-Clause "New" or "Revised" License
2.94k stars 373 forks source link

Keywords from top2vec model are not representative of related documents #189

Closed ervivek closed 3 years ago

ervivek commented 3 years ago

I am working with amazon reviews data. Topic model created as follows - model = Top2Vec(documents=df1, speed="learn", workers=2)

Sharing a topic generated by this model - image

Following are the top 20 documents corresponding to the above topic. Keywords mentioned in the above topic not found in the documents list. Can you please suggest corrective actions?

Document: 2733, Score: 0.9990834593772888 bad screen open second press power bottun fingerprint sensor work properly

Document: 8982, Score: 0.9987523555755615 wifi connect automaticallyit search luck restart mobile connect wifii think major issuethe application load slowly

Document: 7159, Score: 0.9987006783485413 hey defective product hour fully charge fully charge drain fast rate min btry face heating issue yesterday onwards phone start switch automatically help think defective product download update help

Document: 4494, Score: 0.9983952641487122 look hardware software version use lenovo note model bad software lack basic functionality theme basic feature find samsung cast option work defect late operating system change handle mobile note note upgrade mobile latest nogut version 711 help future version oreio available diwali 2018 effort lenovo bad quality product lenovo samsung

Document: 270, Score: 0.9983184337615967 device getting heat application usage min

Document: 8283, Score: 0.9981307983398438 pros1 fantastic display colour reproduction2 design build quality3 average secondary camera selfie flash4 good ram management5 heating issue feel till datecons1 mediocre primary camera2 dedicated music button stop work unexpectedly day

Document: 4286, Score: 0.998047947883606 cost worthy good battery backup camera quality good day time poor night major disadvantage notification lead glow 3rd party application whatsapp messanger hike know issue till product release fix find sad product

Document: 8293, Score: 0.9978888630867004 buy lenovo venom black model day start use satisfied deliver product face issue device gets lock automatically unable unlock use power key new product expect quality substance case request look issue device provide solution

Document: 8204, Score: 0.9978846907615662 pretty good phone lot featured issue till good phone fingerprint sensor backside

Document: 1256, Score: 0.99785977602005 google app respond message againi phone todayand face issue

Document: 4887, Score: 0.9978034496307373 recording feature headphone provide previous lenova phone provide headphone

Document: 2752, Score: 0.9977614283561707 regular customer amazon lenovo mobile previous lenovo mobile fine problem mobile disppointment audio poor sense virtually unable hear people speak eventhough volume maximum think lenovo amazon ditch replacement throw away original packing invoice confident mobile good early lenovo mobile

Document: 6755, Score: 0.9977318644523621 bad mobile phone low battery backup2 recordings3 network problem4 picture finger print sensor keys5 set ringtone device song6 anable edit image tool etc

Document: 9286, Score: 0.9976276755332947 lenovo need maintain standard key structure upgrade find soft key traingular locate left locate right easy comfortable right hander

Document: 559, Score: 0.9976266622543335 write review month use phone phone good build qualitysupereb megapixel camera quit powerful flash performance good play lot game lag gameingcons talk con product bokeh mode blur effect good average battey backup bit good phone heavy usage charge evening

Document: 4976, Score: 0.9975117444992065 volte work battery saver mode advertise phone true sim use sim sim network signal strength good compare redmi note nexus zenfone 2dolby atmos stop work install music player gets switch charge use charge night find gets switch day instal battery temp monitor software hot ~40 deg play game charging remain hot normal use compare redmi note nexus zenfone use booster charger time charge phonefinally return product easy return product amazon hardware fault technician visit need

Document: 3509, Score: 0.9974973797798157 phone goodbut emi 666it 702 taxi end pay 18000tax emi tax upar air taxlike 680 emi tax itand interest let 30rs tax end pay 100 buck actually showso watch

Document: 540, Score: 0.9974951148033142 great phone stock android smooth phone use till date build quality descent rear camera perform good phone heavy user 4000 mah battery good day work look phone rounder range note great option thing leave sound quality good dolby atmo ram varient good ram varient issue completely fine

Document: 9603, Score: 0.9974249601364136 lenovo note awesome good picture dual camera slim metallic body bit difficult hold hand cover feature awesome android battery bit concern regular user

Document: 1817, Score: 0.9974241852760315 bad quality screenmobile fall height screen brokencustomer care lenovo jabalpur respond regard replacement screen pay service basis deposit 4200 screen replacement september 2017 today 26092017 spare screen change casual approach service teambad experience

yasir77 commented 3 years ago

Same issue here I am getting a high score for documents when training using fast learn but getting low documents scores while using deep learn. And the documents in topics are also not related.

ddangelov commented 3 years ago

If you are using embedding_model='doc2vec' then the quality of the document and word embedding will depend on the size and quality of your dataset. I would recommend trying to use embedding_model='universal-sentence-encoder'