Open subhrm opened 8 years ago
For some reason there is a complex number in your matrix. To fix the issue the NumPyEncoder
would need to be extended to handle complex numbers:
https://github.com/bmabey/pyLDAvis/blob/master/pyLDAvis/utils.py#L140-L146
I'm not sure what the best way to handle them would be though. My first thought was to only take the real part. So, we could either do that or you could do the same before sending it in to pyLDAvis.
@subhrm did you ever resolve your problem? If we think complex numbers are going to be a common issue I would merge in a PR that extends the encoder as mentioned above.
@bmabey No I have not been able to resolve it. Couple of my colleagues are also getting same error message with different corpus.
I am now trying to figure out a way to convert those complex numbers to real , by either dropping the imaginary part or calculating and keeping their magnitude .
But if you or any other contributor can do some change in pyLDAvis code base that smartly goes around this issue , it would be great !
Thanks, Subhendu
I ran into the JSON serializable problem when calling pyLDAvis.show()
- same issue with a failure in NumPyEncoder
.
I was able to control the problem based on how many topics (num_topics
) I used when creating the LDA model - gensim.models.ldamodel.LdaModel
. If I set the number of topics to 10 or more the problem occurred; 9 or fewer and it did not. Maybe this is based on the corpus I used.
I ended up modifying NumPyEncoder
to return abs() when it encountered a complex number. I'm not an expert on these codebases so I don't know what the side effects of this are, but the visualization was able to run after I did this.
And finally, pyLDAvis is a sweet, sweet module. Very useful.
I ran into the same issue. Editing pyLDAvis/utils.py
and adding
if np.iscomplexobj(obj):
return abs(obj)
to the ifs in NumPyEncoder, making it
class NumPyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.int64) or isinstance(obj, np.int32):
return int(obj)
if isinstance(obj, np.float64) or isinstance(obj, np.float32):
return float(obj)
if np.iscomplexobj(obj):
return abs(obj)
return json.JSONEncoder.default(self, obj)
solved the issue for me (or at least it will actually display something now).
Thanks @krageon and @bmabey . Editing utils.py in the way you mentioned works!
I changed the pyLDAvis/utils.py and included
class NumPyEncoder(json.JSONEncoder): def default(self, obj): if isinstance(obj, np.int64) or isinstance(obj, np.int32): return int(obj) if isinstance(obj, np.float64) or isinstance(obj, np.float32): return float(obj) if np.iscomplexobj(obj): return abs(obj) return json.JSONEncoder.default(self, obj)
I still get an error when I run the code on ipython notebook - TypeError: 0j is not JSON serializable
I think I originally divined what was going on by using a Python debugger (https://docs.python.org/3/library/pdb.html) and breaking in this function (on return json.JSONEncoder.default(self, obj)) - then you can perform some tests on the obj in question to see what it is that is in here - that should provide some insight in how to fix it.
Can you help me out I am still struggling with this error and I am not able to get able visualizations results of my LDA analysis.
Did you follow the steps I outlined in my last post? I might be able to tell you something about what's going on with that information.
It appears that enough people are running into this so we should merge a fix into the library. Will someone send me a PR with a fix that worked for them?
If the proposed change (taking the absolute value of an imaginary number in case an imaginary number hits that function) doesn't misrepresent the data horrifically, I think that can be arranged.
I can't really answer that question since I've never ran into a case where this was required. Do you have any idea why imaginary numbers are being used in the first place?
I think there was an sqrt somewhere, and the number is negative. Why all that is happening isn't something I'm comfortable answering - it's been a very long time since math class. I seem to recall this problem occurred when I had my topics set to a high number (say, 150-ish) and tried to visualise.
Ran into this issue while debugging with a colleague, but you may want to double check your data if you're using the built-in method abs
to correct the issue, like @krageon described. That will return the magnitude of the number, which will always be positive.
In our case, the data we had included negative values, so we used the following code instead to return the real part, as opposed to the magnitude:
class NumPyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.int64) or isinstance(obj, np.int32):
return int(obj)
if isinstance(obj, np.float64) or isinstance(obj, np.float32):
return float(obj)
if np.iscomplexobj(obj):
return np.real(obj)
return json.JSONEncoder.default(self, obj)
Also in our data, the imaginary part of each number appeared to be zero, so I believe this was the correct action for us. That said, if you're data contains an imaginary part then taking the magnitude might be better. I'm not familiar enough with the library to understand what is occurring at that moment, but just a heads up for anyone else coming across this.
The points you make are good - this is exactly what I meant when I said "If the proposed change doesn't misrepresent the data horrifically". There will be some cases where it does, because data is being lost.
Whether or not that is the right or the wrong data to lose is not a call I can make for the general case. This is why I have not made a PR, and why I'm not comfortable making one until I either have time to brush up on the source material or someone with a strong theoretical grounding presents a good argument either way.
I had a similar problems and tracked down that the complex number were coming from the topic coordinates calculation.
What worked for me was not to rely on the default js_PCoA
mds function and use mmds
instead.
pyLDAvis.gensim.prepare(lda_model, corpus, dictionary, mds='mmds')
Tbh, it is very very well possible that I'm doing something wrong and my 'solution' just masks the initial problem.
What helped me was to include sklearn. PC analysis is made with an alternative library in case sklearn is not included. Since the data which has imaginary parts is about the calculation of angles, I suspect there is some problem with calculation.
I'm having the same problem! Have this issue been fixed?
[2019-08-12 19:39:01,124: ERROR/ForkPoolWorker-24] Task filtro.tasks.classificar_baixados[b958c2c4-93c9-49dd-88fc-8a43cef4dea2] raised unexpected: TypeError("Object of type 'complex' is not JSON serializable",)
Traceback (most recent call last):
File "/opt/app-root/lib/python3.6/site-packages/celery/app/trace.py", line 382, in trace_task
R = retval = fun(*args, **kwargs)
File "/opt/app-root/lib/python3.6/site-packages/celery/app/trace.py", line 641, in __protected_call__
return self.run(*args, **kwargs)
File "/opt/app-root/src/filtro/tasks.py", line 150, in classificar_baixados
aplicar_lda(m_filtro)
File "/opt/app-root/src/filtro/tasks.py", line 160, in aplicar_lda
dados = modelar_lda(conteudos)
File "/opt/app-root/src/filtro/analysis.py", line 24, in modelar_lda
saida = pyLDAvis.prepared_data_to_html(modelo)
File "/opt/app-root/lib/python3.6/site-packages/pyLDAvis/_display.py", line 178, in prepared_data_to_html
vis_json=data.to_json(),
File "/opt/app-root/lib/python3.6/site-packages/pyLDAvis/_prepare.py", line 417, in to_json
return json.dumps(self.to_dict(), cls=NumPyEncoder)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/__init__.py", line 238, in dumps
**kw).encode(obj)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/opt/app-root/lib/python3.6/site-packages/pyLDAvis/utils.py", line 146, in default
return json.JSONEncoder.default(self, obj)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/encoder.py", line 180, in default
o.__class__.__name__)
TypeError: Object of type 'complex' is not JSON serializable
I had the same issue... as a workaround:
vis.topic_coordinates['x'] = np.real(vis.topic_coordinates['x'])
vis.topic_coordinates['y'] = np.real(vis.topic_coordinates['y'])
vis
For anyone still having trouble with this error, setting the normalization parameter to None instead of the 'l2' default worked for me. That is,
vectorizer = TfidfVectorizer(min_df=2, norm=None)
I am not sure why this works as I am fairly unfamiliar with the mathematics behind the code, but it seems to produce results that are consistent with the underlying data in my case.
First of all thanks to the creator and all the contributors of this amazing module.
Today I encountered this issue. I was following the example sklearn notebook and was able to successfully get the visualization for LDA model with tf (
CountVectorizer
) dtm .But when I tried to use the
TfidfVectorizer
, I am getting this issue . Please find below the my code snippet as well the stack-trace of the issue.pyLDAvis.sklearn.prepare(lda_tfidf, tfidf, tfidf_vectorizer, R=10,sort_topics=False)
Any help to resolve this would be much appreciated.
I am also trying to find a resolution for this issue and if I could resolve it on my own , I would let you know .