Open acozzubo opened 3 years ago
Hello @acozzubo , the same packages we used also allow us to extract text from local installations. Stack overflow is your best friend for such questions:
1) Python-docx: https://stackoverflow.com/questions/25228106/how-to-extract-text-from-an-existing-docx-file-using-python-docx
2) pdfminer: https://stackoverflow.com/questions/26494211/extracting-text-from-a-pdf-file-using-pdfminer-in-python
You can also repurpose the existing code in the jupyter notebook a little bit to get it work for local files. Hopefully this should help!
(And yes, this is the right place for such questions - we're not using Piazza for this course, all the content is on Canvas + GitHub!)
Similar question - I cannot get past the error message for using the Shakespeare corpus in HW1. Should we download these files locally and extract from there? Or is there a way to pull them directly from the Jupyter notebook? Running this: targetDir = 'Homework-Notebooks/data/Shakespeare' Does not work
@joshuabsilver , when you cloned the Homework Notebooks repository, it should have downloaded the Data (including the Shakespeare files). If not, you can manually download it from the repository and try again - I am not sure what you mean when you say pull directly from the Jupyter notebook, though - could you elaborate on that? When you run the code with that data on your local, that command should work fine.
Hello,
By doing HW1, I was wondering how can we read PDF and Word docs save locally.
The examples in the code used online documents and when I tried changing the URLs for folder paths it did not work.
Thanks!
PD. Is this the right place to ask this question? Should we have a Piazza for this?