Open daaronr opened 3 years ago
@daaronr
By powerpoints i'm assuming that you mean the powerpoints in the data_acad_materials repo? Is the aim of conversion to extract the text or to keep the slide structure?
Not sure if you have used the xaringan
package but it seems to be a good way to have Rmd flavoured slides. This blog post shows a good way to embed such slideshows into blogdown so i'm sure there would be some way to embed these into a bookdown!
By powerpoints i'm assuming that you mean the powerpoints in the data_acad_materials repo?
Yes, that is the immediate goal
Is the aim of conversion to extract the text or to keep the slide structure?
First to simply extract the text and convert it to markdown format, to use in the bookdown.
Not sure if you have used the xaringan package but it seems to be a good
way to have Rmd flavoured slides. This blog post https://timmastny.rbind.io/blog/embed-slides-knitr-blogdown/ shows a good way to embed such slideshows into blogdown so i'm sure there would be some way to embed these into a bookdown!
I've tried xaringan, will have another look. As you know I've mostly used reveal.js to do markdown-based html slides.
I'm trying to recall what the reason was I abandoned xaringan. Did it use a non-standard markdown syntax perhaps?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/daaronr/dr-rstuff/issues/5#issuecomment-751472851, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6ZCMH7FUA3RI24HBBRWRDSW46VVANCNFSM4VKAURWA .
I used slidex to convert to xaringan https://rdrr.io/github/datalorax/slidex
Xaringan improved quite a bit over the last year. It uses as all presentation packages a markdown flavor but this seems no worse than the others. Xaringan integrates kind of nicely in the rstudio ecosystem
slidex -- awesome!
I love markdown as you know, and Xaringan syntax does look pretty good. I think I had some trouble getting it to display local images; maybe that was the problem, but I guess it's fixed by now.
w
On Sun, Dec 27, 2020 at 3:47 PM gerhardriener notifications@github.com wrote:
I used slidex to convert to xaringan https://rdrr.io/github/datalorax/slidex
Xaringan improved quite a bit over the last year. It uses as all presentation packages a markdown flavor but this seems no worse than the others. Xaringan integrates kind of nicely in the rstudio ecosystem
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/daaronr/dr-rstuff/issues/5#issuecomment-751514162, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6ZCMHJOT727VN3RHTPWVTSW6MMJANCNFSM4VKAURWA .
I couldn't get slidex to work myself, it would extract all the images from each but not the text.
Fortunately there is a Python module for dealing with Powerpoints. I have created a script to do the necessary conversion here
Well done Oska, thanks!
On Sun, Dec 27, 2020 at 5:10 PM Oska Fentem notifications@github.com wrote:
I couldn't get slidex to work myself, it would extract all the images from each but not the text.
Fortunately there is a Python module for dealing with Powerpoints. I have created a script to do the necessary conversion here https://github.com/daaronr/data_acad_materials/blob/a911e8fa62da409b922a068f43e87178cc6ee062/code/convert_ppt.py
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/daaronr/dr-rstuff/issues/5#issuecomment-751521777, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6ZCMEBEEOHEPKZIUEUEL3SW6WFBANCNFSM4VKAURWA .
https://github.com/ssine/pptx2md also seems promising ... but I can't get it to work
@oskasf did you do the conversion with the script? where did you put these?
@oskasf did you do the conversion with the script? where did you put these?
Wait, I see it now (moving between two repos, sorry)
more Powerpoints to convert, but I can't get your @oskasf script to work. How can I run it? Note that I edited it for my file system. Of course ideally one uses only relative folder references.
$ py convert_ppt.py
bash: py: command not found
$ Python convert_ppt.py
Traceback (most recent call last):
File "convert_ppt.py", line 1, in <module>
from pptx import Presentation
ImportError: No module named pptx
$ python convert_ppt.py
Traceback (most recent call last):
File "convert_ppt.py", line 1, in <module>
from pptx import Presentation
ImportError: No module named pptx
$
$ convert_ppt.py
bash: convert_ppt.py: command not found
$ python3 convert_ppt.py
Traceback (most recent call last):
File "convert_ppt.py", line 7, in <module>
pres = Presentation('other_content_notes/powerpoint/big_data_management.pptx')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pptx/api.py", line 28, in Presentation
presentation_part = Package.open(pptx).main_document_part
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pptx/opc/package.py", line 125, in open
pkg_reader = PackageReader.from_file(pkg_file)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pptx/opc/pkgreader.py", line 33, in from_file
phys_reader = PhysPkgReader(pkg_file)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pptx/opc/phys_pkg.py", line 32, in __new__
raise PackageNotFoundError("Package not found at '%s'" % pkg_file)
pptx.exc.PackageNotFoundError: Package not found at 'other_content_notes/powerpoint/big_data_management.pptx'
@daaronr You need to install the pptx
module using pip install python-pptx
.
Note that your installation of Python must be >= Python3.0. You can check if this is satisfied using the command Python3
in a terminal window. If this isn't installed you can use homebrew to install
yes, I think I took all these steps; note that I used 'python3' in the code above. Wait, maybe the 'P' needs a capital?
I don't think the case matters here. See what happened below...
$ pip install python-pptx
Requirement already satisfied: python-pptx in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (0.6.18)
Requirement already satisfied: lxml>=3.1.0 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from python-pptx) (4.6.2)
Requirement already satisfied: XlsxWriter>=0.5.7 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from python-pptx) (0.9.3)
Requirement already satisfied: Pillow>=3.3.2 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from python-pptx) (7.0.0)
WARNING: You are using pip version 20.0.1; however, version 20.3.3 is available.
You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 -m pip install --upgrade pip' command.
$ Python3
Python 3.6.4 (v3.6.4:d48ecebad5, Dec 18 2017, 21:07:28)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> exit
Use exit() or Ctrl-D (i.e. EOF) to exit
>>> exit(
... )
$ Python3 convert_ppt.py
Traceback (most recent call last):
File "convert_ppt.py", line 7, in <module>
pres = Presentation('other_content_notes/powerpoint/big_data_management.pptx')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pptx/api.py", line 28, in Presentation
presentation_part = Package.open(pptx).main_document_part
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pptx/opc/package.py", line 125, in open
pkg_reader = PackageReader.from_file(pkg_file)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pptx/opc/pkgreader.py", line 33, in from_file
phys_reader = PhysPkgReader(pkg_file)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pptx/opc/phys_pkg.py", line 32, in __new__
raise PackageNotFoundError("Package not found at '%s'" % pkg_file)
pptx.exc.PackageNotFoundError: Package not found at 'other_content_notes/powerpoint/big_data_management.pptx'
Ah, I don't think I wrote this correctly for it to be run through terminal (lack of absolute paths). If you open the file in Rstudio you should be able to run it
If you have time to do the conversions for me that would be great. Otherwise I'll try it in Rstudio
On Sat, Jan 9, 2021 at 10:46 AM Oska Fentem notifications@github.com wrote:
Ah, I don't think I wrote this correctly for it to be run through terminal (lack of absolute paths). If you open the file in Rstudio you should be able to run it
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/daaronr/dr-rstuff/issues/5#issuecomment-757325198, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6ZCMDA5G34X472WT4K26DSZB24ZANCNFSM4VKAURWA .
@daaronr Should be possible to run the file from terminal now. Simply drag the python executable into the folder with the powerpoints and execute the file, will create a new folder conv to put them in. Here
It is still throwing errors:
$ Python3 convert_ppt.py
Traceback (most recent call last):
File "convert_ppt.py", line 19, in <module>
prs = Presentation(eachfile)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pptx/api.py", line 28, in Presentation
presentation_part = Package.open(pptx).main_document_part
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pptx/opc/package.py", line 125, in open
pkg_reader = PackageReader.from_file(pkg_file)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pptx/opc/pkgreader.py", line 33, in from_file
phys_reader = PhysPkgReader(pkg_file)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pptx/opc/phys_pkg.py", line 32, in __new__
raise PackageNotFoundError("Package not found at '%s'" % pkg_file)
pptx.exc.PackageNotFoundError: Package not found at '/Users/yosemite/githubs/data_acad_materials_gh_ver/other_content_notes/powerpoint/~$DS AI Training Outline v0.1.pptx'
$ pwd
/Users/yosemite/githubs/data_acad_materials_gh_ver/other_content_notes/powerpoint
$ ls
@oskasf Did we ever solve this? I still cannot get it to run, same error as above.
OK it is working for my purposes right now (moved file of interest to its own folder), but I'm still not sure what's going on in the error above.
@oskasf Can it be adapted to also incorporate the 'speaker notes' into the .md in some way? Thanks
We also seem to lose the images... any way to recover them?
@daaronr Hm perhaps it may be easier to just use Slidex as this keeps images and writes a text file containing speakers notes. I'm not sure if it will be possible to fully automate this process so likely best to just use Slidex and check formatting for each file (I will start doing this now).
try https://github.com/revan/pptx2md#powerpoint-to-markdown-converter?