Closed sojohan closed 5 years ago
Hi @sojohan , is DoctorsNotes
the directory in which your text files are stored? The textacy.io.read_text()
function only accepts the path to a single file on disk, not a directory. If you want to iterate over the files in a given directory, you can use textacy.io.get_filenames()
:
>>> for fname in textacy.io.get_filenames("./DoctorsNotes", extension=".txt"):
... record = textacy.io.read_text(fname)
Hi experts
I have a folder with some txt files that I would like to make into a corpus and then do topic modeling.
I have tried the following:
records = textacy.io.read_text('./DoctorsNotes', lines=True,mode='rt') for record in records: doc1 = textacy.Doc(record,lang=da) print(doc1)
I then get an error 'permission denied'. If I do one file it works……
records = textacy.io.read_text('./DoctorsNotes/text_sample.txt', lines=True,mode='rt') for record in records: doc1 = textacy.Doc(record,lang=da) print(doc1)
Doc(35 tokens; "Ved fremstilling af net til fixation i mould gj...") Doc(34 tokens; "Patienten klarede CT scanning uden problemer. D...") Doc(25 tokens; "Constraints til parotis (dxt+sin) kan ikke over...") Doc(36 tokens; "Ved plantjek vurderes begge planoplæg. Som note...") Doc(33 tokens; "Svært match ved nyopstiling. Patienten lå noget...") Doc(32 tokens; "Igen svært match ved nyopstilling. Fysiker igen...") Doc(90 tokens; "Konferencepatient: Vurderet på konference grund...")
But how to I get multiple files into a corpus?
Thanks,
Expected Behavior
Current Behavior
Possible Solution
Steps to Reproduce (for bugs)
Context
Your Environment
spacy
version: 2.0.18spacy
models:textacy
version: 0.6.2