chartbeat-labs / textacy

NLP, before and after spaCy
https://textacy.readthedocs.io
Other
2.21k stars 249 forks source link

Corpus.load() gives "buffer source array is read-only" error #207

Closed pwin closed 4 years ago

pwin commented 6 years ago

Expected Behavior

I should be able to load a saved corpus

Current Behavior

loading a Corpus gives the following traceback:

File "C:\Anaconda3\lib\site-packages\textacy\corpus.py", line 237, in load first_spacy_doc, spacy_docs = itertoolz.peek(spacy_docs)

File "cytoolz/itertoolz.pyx", line 1711, in cytoolz.itertoolz.peek

File "cytoolz/itertoolz.pyx", line 1726, in cytoolz.itertoolz.peek

File "C:\Anaconda3\lib\site-packages\textacy\io\spacy.py", line 56, in read_spacy_docs for spacy_doc in compat.pickle.load(f):

File "doc.pyx", line 1020, in spacy.tokens.doc.unpickle_doc

File "doc.pyx", line 838, in spacy.tokens.doc.Doc.from_bytes

File "stringsource", line 646, in View.MemoryView.memoryview_cwrapper

File "stringsource", line 347, in View.MemoryView.memoryview.cinit

ValueError: buffer source array is read-only

Steps to Reproduce (for bugs)

create a corpus, save and then reload with textacy.Corpus.load('C:/python_code/DocCorpus.gz')

Context

Your Environment

bdewilde commented 5 years ago

Hey @pwin , I recently released v0.7.0, in which I totally reworked the Corpus class, including .save() and .load(). Please give it a try!