getnikola / plugins

Extra plugins for Nikola
https://plugins.getnikola.com/
MIT License
59 stars 95 forks source link

[similarity] Problems when creating new posts while plugin is enabled #246

Open keikoro opened 7 years ago

keikoro commented 7 years ago

I don't know enough about Nikola yet to be able to pinpoint the problem, but the similarity plugin seems to have a problem with the creation of new posts via nikola new_post.

I first noticed the problem in the log of the window in which I had nikola auto running (the log output was suddenly red), but the error is the same even when Nikola doesn't build automatically but nikola build is called manually. It seems to only apply to the newest post though.

Log of when, after I'd stopped nikola auto, I created 3 new posts in a row (showing the last 2 here), then ran nikola build:

THIS_VENV [01:04] THIS_MACHINE:THIS_DIR git:(master*) $ nikola new_post
[2017-08-26T23:04:21Z] INFO: summa.preprocessing.cleaner: 'pattern' package not found; tag filters are not available for English
Creating New Post
-----------------

Title: A Draft Post, I will tag this with 'draft' right away
Scanning posts........done!
[2017-08-26T23:04:46Z] INFO: new_post: Your post's text is at: posts/a-draft-post-i-will-tag-this-with-draft-right-away.rst
THIS_VENV [01:04] THIS_MACHINE:THIS_DIR git:(master*) $ nikola new_post
[2017-08-26T23:05:05Z] INFO: summa.preprocessing.cleaner: 'pattern' package not found; tag filters are not available for English
Creating New Post
-----------------

Title: A Private Post, I will tag this with 'private' right away
Scanning posts........done!
[2017-08-26T23:05:15Z] INFO: new_post: Your post's text is at: posts/a-private-post-i-will-tag-this-with-private-right-away.rst
THIS_VENV [01:05] THIS_MACHINE:THIS_DIR git:(master*) $ nikola build
[2017-08-26T23:05:41Z] INFO: summa.preprocessing.cleaner: 'pattern' package not found; tag filters are not available for English
Scanning posts........done!
.  similarity:public/posts/a-private-post-i-will-tag-this-with-private-right-away/index.html.related.json
[2017-08-26T23:05:42Z] INFO: gensim.corpora.dictionary: adding document #0 to Dictionary(0 unique tokens: [])
[2017-08-26T23:05:42Z] INFO: gensim.corpora.dictionary: built Dictionary(231 unique tokens: ['pages', 'sphinx', "'card':", 'able', 'felt']...) from 12 documents (total 299 corpus positions)
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: using serial LSI version on this node
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: updating model with new documents
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: preparing a new chunk of documents
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: using 100 extra samples and 2 power iterations
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: 1st phase: constructing (231, 102) action matrix
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: orthonormalizing (231, 102) action matrix
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: 2nd phase: running dense svd on (102, 12) matrix
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: computing the final decomposition
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: keeping 2 factors (discarding 28.220% of energy spectrum)
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: processed documents up to #12
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: topic #0(17.118): 0.759*".." + 0.292*"#" + 0.117*"python" + 0.117*"8n_tuppbtwq" + 0.117*"twitter" + 0.117*"/images/tesla.jpg" + 0.117*"thumbnail::" + 0.117*"media::" + 0.117*"vimeo::" + 0.117*"youtube::"
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: topic #1(8.754): 0.253*"want" + 0.199*"--" + 0.199*"projects" + 0.199*"blog" + 0.199*"REDACTED" + 0.167*"github" + 0.154*"past" + 0.153*"use" + 0.135*"gitlab" + 0.127*"pages"
[2017-08-26T23:05:42Z] WARNING: gensim.similarities.docsim: scanning corpus to determine the number of features (consider setting `num_features` explicitly)
[2017-08-26T23:05:42Z] INFO: gensim.similarities.docsim: creating matrix with 12 documents and 2 features
########################################
TaskError - taskid:similarity:public/posts/a-private-post-i-will-tag-this-with-private-right-away/index.html.related.json
PythonAction Error
Traceback (most recent call last):
  File "PATH_TO_MY_VENV/lib/python3.5/site-packages/doit/action.py", line 403, in execute
    returned_value = self.py_callable(*self.args, **kwargs)
  File "PATH_TO_MY_NIKOLA_DIR/plugins/similarity/similarity.py", line 118, in write_similar
    with open(path, 'w+') as outf:
FileNotFoundError: [Errno 2] No such file or directory: 'public/posts/a-private-post-i-will-tag-this-with-private-right-away/index.html.related.json'

[2017-08-26T23:05:42Z] INFO: gensim.corpora.dictionary: adding document #0 to Dictionary(0 unique tokens: [])
[2017-08-26T23:05:42Z] INFO: gensim.corpora.dictionary: built Dictionary(231 unique tokens: ['pages', 'sphinx', "'card':", 'able', 'felt']...) from 12 documents (total 299 corpus positions)
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: using serial LSI version on this node
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: updating model with new documents
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: preparing a new chunk of documents
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: using 100 extra samples and 2 power iterations
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: 1st phase: constructing (231, 102) action matrix
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: orthonormalizing (231, 102) action matrix
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: 2nd phase: running dense svd on (102, 12) matrix
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: computing the final decomposition
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: keeping 2 factors (discarding 28.220% of energy spectrum)
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: processed documents up to #12
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: topic #0(17.118): 0.759*".." + 0.292*"#" + 0.117*"python" + 0.117*"8n_tuppbtwq" + 0.117*"twitter" + 0.117*"/images/tesla.jpg" + 0.117*"thumbnail::" + 0.117*"media::" + 0.117*"vimeo::" + 0.117*"youtube::"
[2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: topic #1(8.754): 0.253*"want" + 0.199*"--" + 0.199*"projects" + 0.199*"blog" + 0.199*"REDACTED" + 0.167*"github" + 0.154*"past" + 0.153*"use" + 0.135*"gitlab" + 0.127*"pages"
[2017-08-26T23:05:42Z] WARNING: gensim.similarities.docsim: scanning corpus to determine the number of features (consider setting `num_features` explicitly)
[2017-08-26T23:05:42Z] INFO: gensim.similarities.docsim: creating matrix with 12 documents and 2 features

What's also interesting, though possibly not related (?), is that gensim/the plugin seems to check posts that don't actually exist anymore, or whose contents have changed?

I'm not sure what the matrix with 12 documents refers to, but the number of posts in my /public/ folder is 6 (1 of which is tagged private), the number of posts in /posts/ is 11 (2 of which are tagged private and 3 of which are tagged draft). The words listed at [2017-08-26T23:05:42Z] INFO: gensim.models.lsimodel: topic #0 are from a post that got set to draft and whose contents (= the words listed by gensim) I replaced (with something completely different).

And indeed the log for nikola auto (though not nikola build), starts like this:

THIS_VENV [00:52] THIS_MACHINE:THIS_DIR git:(master*) $ nikola auto
[2017-08-26T23:13:52Z] INFO: summa.preprocessing.cleaner: 'pattern' package not found; tag filters are not available for English
[2017-08-26T23:13:54Z] INFO: summa.preprocessing.cleaner: 'pattern' package not found; tag filters are not available for English
Scanning posts........done!
.  render_taxonomies:public/archive.html
.  render_posts:timeline_changes
.  render_posts:cache/posts/a-private-post-i-will-tag-this-with-private-right-away.html
.  render_posts:cache/posts/a-draft-post-i-will-tag-this-with-draft-right-away.html
.  render_posts:cache/posts/a-test-post-testing-similarity.html
.  similarity:public/posts/a-private-post-i-will-tag-this-with-private-right-away/index.html.related.json
[2017-08-26T23:13:54Z] INFO: gensim.corpora.dictionary: adding document #0 to Dictionary(0 unique tokens: [])

Looks like cached contents are used? /cache/posts contains 24 files, 2 per post (one is the .html, a second ends in .html.dep), including a post that I'd deleted from /posts/ a while ago. The cached version of the file whose words gensim uses are not the current contents but the old contents.

And looking at the locally served version of the blog, it looks broken – the error caused by the plugin seems to stop new posts or changed posts from building... Only disabling the plugin in conf.py and then running nikola build helped fix it.

ralsina commented 7 years ago

Looks like a bug indeed. I'll take a look.

michaelb42 commented 6 years ago

Just stumbled across this one after installing the plugin. Any news here?

ralsina commented 6 years ago

Not yet. Maybe over the long weekend.

randlow commented 5 years ago

Having an error with the similarity plugin as below. Can someone help?

TaskError - taskid:similarity:output/posts/nlp/nlp-main/index.html.related.json PythonAction Error Traceback (most recent call last): File "/home/randlow/anaconda3/envs/nikola/lib/python3.7/site-packages/doit/action.py", line 424, in execute returned_value = self.py_callable(*self.args, **kwargs) File "/home/randlow/github/blog2/plugins/similarity/similarity.py", line 121, in write_similar with open(path, 'w+') as outf: FileNotFoundError: [Errno 2] No such file or directory: 'output/posts/nlp/nlp-main/index.html.related.json'