Closed avkoehl closed 4 years ago
topic_proportion table is updated in datasci now. I added the code for it to mallet_tm_to_sql.R. Same as code above but didn't multiply by doclen because mallet doc_topics is already freq instead of proportion
Doesn't match ldavis: | topic | proportion | ldavis |
---|---|---|---|
V1 | 0.0140469427705953 | 0.006429357 | |
V2 | 0.0144728679318146 | 0.01230919 | |
V3 | 0.0237056319470436 | 0.02251543 | |
V4 | 0.014286761103971 | 0.007561433 | |
V5 | 0.0208729339364131 | 0.04697035 | |
V6 | 0.0201209374707088 | 0.02219469 | |
V7 | 0.00881664336638489 | 0.01590796 | |
V8 | 0.00932888244515094 | 0.007222459 | |
V9 | 0.00640227383844603 | 0.003496074 | |
V10 | 0.0141274766066537 | 0.02299951 | |
V11 | 0.00770791364010918 | 0.01199015 | |
V12 | 0.0108863706205394 | 0.01918798 | |
V13 | 0.0138755790825756 | 0.009039159 | |
V14 | 0.0264225022362821 | 0.01703177 | |
V15 | 0.0074993038147429 | 0.007228903 | |
V16 | 0.0254177255061635 | 0.01867842 | |
V17 | 0.0301945578574636 | 0.02880095 | |
V18 | 0.00698947703529193 | 0.005128058 | |
V19 | 0.00682013047775079 | 0.007520889 | |
V20 | 0.0133322651847597 | 0.01003781 | |
V21 | 0.0144454675931395 | 0.03327913 | |
V22 | 0.00863066826148683 | 0.005538213 | |
V23 | 0.0188029255422406 | 0.01187994 | |
V24 | 0.018775672654624 | 0.01475935 | |
V25 | 0.0118872389146039 | 0.01043374 | |
V26 | 0.0099303392443068 | 0.008802691 | |
V27 | 0.0102712244651236 | 0.009430731 | |
V28 | 0.00403399947737054 | 0.01143916 | |
V29 | 0.0216767171798342 | 0.01707382 | |
V30 | 0.00937696328626714 | 0.008478399 | |
V31 | 0.0117284144645887 | 0.009556778 | |
V32 | 0.00758063388651649 | 0.01035187 | |
V33 | 0.0140774435135245 | 0.01853502 | |
V34 | 0.0171806198476029 | 0.01809397 | |
V35 | 0.0174895042567233 | 0.009598752 | |
V36 | 0.0100248022903467 | 0.01595963 | |
V37 | 0.00884071131114055 | 0.004460869 | |
V38 | 0.00894147543240849 | 0.007666314 | |
V39 | 0.0141675832945327 | 0.01131902 | |
V40 | 0.0079482549332386 | 0.007224388 | |
V41 | 0.00962150005351097 | 0.007210551 | |
V42 | 0.0172221971140376 | 0.0190041 | |
V43 | 0.0171639952322742 | 0.01888237 | |
V44 | 0.0138349081486464 | 0.01049481 | |
V45 | 0.00987995423465166 | 0.009441511 | |
V46 | 0.00586454342070743 | 0.01068514 | |
V47 | 0.0195939237958932 | 0.02035888 | |
V48 | 0.019128562357607 | 0.008396824 | |
V49 | 0.0169857760530469 | 0.009945937 | |
V50 | 0.0280600976226666 | 0.03256612 | |
V51 | 0.00994556209157384 | 0.005294785 | |
V52 | 0.00575317855134911 | 0.004961739 | |
V53 | 0.0124979870964517 | 0.01423823 | |
V54 | 0.00780712657624297 | 0.005786062 | |
V55 | 0.0141625994487595 | 0.00774025 | |
V56 | 0.0178488896703314 | 0.01631112 | |
V57 | 0.018510526161447 | 0.02342016 | |
V58 | 0.00822112786187464 | 0.00364913 | |
V59 | 0.0189233380085411 | 0.01907004 | |
V60 | 0.0125378126443128 | 0.00815129 | |
V61 | 0.0163592859962065 | 0.03461295 | |
V62 | 0.0243410418099088 | 0.02541088 | |
V63 | 0.0228920481232756 | 0.01417038 | |
V64 | 0.00606018541699281 | 0.01422526 | |
V65 | 0.00702214825780047 | 0.007624669 | |
V66 | 0.00938867090030243 | 0.006364135 | |
V67 | 0.0236325394743981 | 0.02141724 | |
V68 | 0.008949614730831 | 0.01490856 | |
V69 | 0.00154656106895553 | 0.001021729 | |
V70 | 0.0041829840267566 | 0.003633223 | |
V71 | 0.008558643334504 | 0.008490964 | |
V72 | 0.00894664998155053 | 0.005604691 | |
V73 | 0.00504183335943283 | 0.003452381 | |
V74 | 0.0158015435208247 | 0.01422761 | |
V75 | 0.012574807131853 | 0.02309395 |
@avkoehl createJSON()
expects doc topics and topic terms to be normalized so that all rows sum to 1. After doing this, the topic proportions we calculated match ldavis
Okay, perfect, go ahead and write to the table if you havent already. Once the table for topic proportions has been overwritten with these new valeus on datasci, go ahead and close this issue!
Motivation
We need to update the topic proportions for the new topic models graph. The topic proportions are simply the topic proportions for the full corpus computed using ldavis' method:
Task