derekgreene / dynamic-nmf

Dynamic Topic Modeling via Non-negative Matrix Factorization
Apache License 2.0
282 stars 87 forks source link

track-dynamic-topics KeyError #4

Open zdarktknight opened 7 years ago

zdarktknight commented 7 years ago

Hello everyone! Thank you for providing such wonderful tool! I am studying topic model now.
I follow exactly the README instructions, but when I try 'track-dynamic-topics.py out....' I have a KeyError.

Traceback (most recent call last): File "track-dynamic-topics.py", line 101, in main() File "track-dynamic-topics.py", line 57, in main dynamic_topic_idx = assigned_window_map[window_topic_label] KeyError: 'month1_06'

Could you please help me with it? Thank you very much.

Regards, Tong

derekgreene commented 7 years ago

It sounds like your window topic files do not match your dynamic topic files. Are you passing dynamictopics_k06.pkl as the input to track-dynamic-topics.py, along with the corresponding window topic files?

The README probably isn't clear about tracking the dynamic topics, so I will update.

Regards, Derek.

derekgreene commented 7 years ago

Actually noticed this was an old version of the examples in the README. I have updated the command lines and results in the README, and pushed a new version. Hopefully this works.

Thanks for letting me know, Derek.

zdarktknight commented 7 years ago

Thank you very much for your quick reply. I want to make it more clear.

  1. If we use: python find-dynamic-topics.py out/month1_windowtopics_k05.pkl out/month2_windowtopics_k08.pkl out/month3_windowtopics_k08.pkl -k 4,10 -o out -m out/w2v-model.bin It will build a dynamic model of (month1_k05, month2_k08, month3_k08) and the result is out/dynamictopics_k05.pkl So when we track the topics, we should use: out/dynamictopics_k05.pkl out/month1_windowtopics_k05.pkl out/month2_windowtopics_k08.pkl out/month3_windowtopics_k08.pkl. We can build different DTM and track topics by changing the windows.

If we use python find-window-topics.py data/*.pkl -k 4,10 -o out -m out/w2v-model.bin -w selected.csv We have dynamictopics_k04 to dynamictopics_k10 and *_windowtopics_k04 to *_windowtopics_k10. I guess dynamictopics_k04 is generated by *_windowtopics_k04 so we could only track topics for all *_windowtopics_k04. If we want to track month1_windowtopics_k04, month2_windowtopics_k05, month3_windowtopics_k07, we must use 1st method.

  1. Would you mind briefly explaining what does the output 'Window 2(2)' mean here? I understand that under each 'Window #', it displays the top terms. But what do 'Window 2' and 'Window 2(2)' mean here then?

Regards, Tong

derekgreene commented 7 years ago

1 - Yes that is correct, you can change the individual window models that are combined to create the dynamic model.

2 - Window 2 and Window 2(2) indicate that two related topics in time window #2 were mapped to the same dynamic topic in the final model.

Regards, Derek.