Open ed9w2in6 opened 5 months ago
Are you creating a matching pull request?
@msusol Yes, still WIP though. Ideally cleaning up the code base would be better but I do not have such plans. My plan is to just, as mentioned above, a quick hack:
prepare
, default to None, some logic to generate dummy topic name if None.PreparedData
We have whole family of issues that are just about the numbering of topics during visualisation:
79
93
127
185
213
92
265 (fix at #266
They can all be resolved just by decoupling the numbering from labels, which also remove the need of
sort_topics
, andstart_index
options in the python API.Now I am not going into details on how to implement or specification of outcomes, but here are some ideas:
Outline
python
API sideWe currently generate topic numbers at
topic_top_term_df
in_prepare.py
. We use enumerate andstart_index
to generate the numbering, in which it is supplied by user fromprepare
method, smuggled through_topic_info
method. https://github.com/bmabey/pyLDAvis/blob/16800f36bc95b4c99d8c26d51daa3485c8cb76da/pyLDAvis/_prepare.py#L276Sorting is orthogonal to this logic, hence we can safely ignored it when changing such code: https://github.com/bmabey/pyLDAvis/blob/16800f36bc95b4c99d8c26d51daa3485c8cb76da/pyLDAvis/_prepare.py#L413-L416
The number generated from
enumerate
will ultimately be used to name the topic, stored asCategory
: https://github.com/bmabey/pyLDAvis/blob/16800f36bc95b4c99d8c26d51daa3485c8cb76da/pyLDAvis/_prepare.py#L265I believe we should allow user to supply a list of strings.
If we change this we need to change this too: https://github.com/bmabey/pyLDAvis/blob/16800f36bc95b4c99d8c26d51daa3485c8cb76da/pyLDAvis/_prepare.py#L443-L449
and made sure none of them are named
"Default"
, since we used it as default: https://github.com/bmabey/pyLDAvis/blob/16800f36bc95b4c99d8c26d51daa3485c8cb76da/pyLDAvis/_prepare.py#L237-L242And that is for
topic_info
data only, we have to do the same ofmdsData
andtoken_table
too. Clearly a better way is just to side-step it and just supply a desired list of names and store into thePreparedData
namedtuple.Solution: side step at JS visualisation side
Currently, our visualisation logic made hard assumptions that Category must be in the form of
"TopicN"
whereN
is a number: https://github.com/bmabey/pyLDAvis/blob/16800f36bc95b4c99d8c26d51daa3485c8cb76da/pyLDAvis/js/ldavis.js#L697-L701Therefore, again, the path of lowest friction is to side-step it only changing the visualisation logic:
In which
2
is optional. So only 3 changes in total!Summary, changes needed
PreparedData