gentzkow / GentzkowLabTemplate

MIT License
0 stars 2 forks source link

Figure out how to export Powerpoint documents to PDF in a make.sh script #5

Closed gentzkow closed 5 months ago

gentzkow commented 6 months ago

For certain purposes (e.g., teaching) I often make slides in Microsoft Powerpoint rather than in Latex/Lyx. I would like to figure out how we can incorporate Powerpoint files in the template. In doing this, I'd like to treat the Powerpoint file as /source/ and export a PDF version of the slides to /output/.

The key will be to figure out how we can compile a Powerpoint file to PDF from the command line. Ideally, if we have /source/slides.pptx we would like to be able to add a command to make.sh in the /slides/ directory like

run_pptx slides.pptx .../$LOGFILE ../output

which in turn calls a command line tool like

export_pptx_to_pdf slides.pptx ../output/slides.pdf

My sense is that there is no off-the-shelf command-line tool that does this. The most promising thing I've been able to come up with for MacOS is that we would write a visual basic script that opens the file in PowerPoint and exports it. GPT4 proposes

image

We'd need to update it so the filenames get passed from the command line rather than hardcoded, like this

image

Once we get this working on Macs we can think about whether it's worth worrying about Linux/Windows solutions.

shrishj commented 6 months ago

Thanks for your message. I have spoken to Shiqi for some clarifications and have created a plan of action. I will get started on building the functionality.

shrishj commented 6 months ago

Hi @gentzkow. I have been working on this issue and have pushed a commit here. The program does not error but it is not creating a PDF as per requirement. I have been stuck on the problem for a while, but I will figure it out soon. As I am still debugging, I have not integrated it within the make.sh file in slides and will do so once I get the function completed. I will keep you posted as to when we have a working version for you to try. Thanks!

ShiqiYang2022 commented 6 months ago

@shrishj Thanks for all your work on this! Sorry I had less bandwidth on this due to another project, but I will have time to look over this soon, and provide any assistance I can. I will reach out offline to you.

gentzkow commented 6 months ago

Thanks both. Let's hold off integrating this into the template for now. Before we do that I'd like to have a chance to experiment w/ the solution myself.

shrishj commented 6 months ago

Hi @gentzkow. I have been working on fixing the solution for a while but seem to be getting a similar error. I have been looking into some of the Automator functions on Mac to see if there is something I can do from there. After scoping some initial research, I believe I can create a solution in Python - but that will create certain dependencies that we may not want. Would you recommend I look into a Python solution?

EDIT after speaking to @ShiqiYang2022:

Apologies for the vague previous message. I have made a recent commit that can be found here where I have added in some extra error checking, automatically recognizing the repository root to find the test presentation, and the example PowerPoint I have been working with.

Shrishs-MacBook-Pro:lib shrishjanarthanan$ ./export_pptx_to_pdf.sh
button returned:OK
Error: PDF file was not created.
shrishj commented 6 months ago

Hi @gentzkow and @ShiqiYang2022. I have found the issue and fixed the PPTX to PDF functionality in this commit. The functionality can be found in lib/export_pptx_to_pdf.scpt and lib/export_pptx_to_pdf.sh. There seemed to be an issue with the output file path and it needed to be a POSIX file compared to what I had before.

You can test out the basic functionality from the root using this command cd lib && ./export_pptx_to_pdf.sh or if you are already in the lib folder, you can use ./export_pptx_to_pdf.sh.

I wanted to clarify how we should move forward on integrating this into the make.sh file in slides. We need to first check if there exists a PowerPoint file in the source and then call our function to save the PowerPoint as a PDF in the output directory. Thanks.

ShiqiYang2022 commented 6 months ago

@shrishj Thanks for all the work here! Glad to see we made the debug success. Confirming I can run it via command line.

@gentzkow The script is ready for your test now. We are holding-off integration per https://github.com/gentzkow/GentzkowLabTemplate/issues/5#issuecomment-2015514049. Maybe we need to edits in lib/export_pptx_to_pdf.sh converting it into a run_pptx.sh file that lives under lib/shell; and every time we can include a line run_pptx ${input_path} ${output_path} to call that script in the make.sh file to call the corresponding script.

gentzkow commented 6 months ago

Thanks both! I tested this and it works well. Here's what I think we should do.

  1. Convert to run_pptx.sh script that can be run from make.sh as @ShiqiYang2022 suggests.
  2. Add a top-level /extensions/ directory to the template that we will use for optional functionality we want to offer the user but that we don't want to make part of the default template out of the box. (I don't want this to be part of the default 3_slides directory because that would mean users can't run the template if they're not on Mac, don't have Powerpoint installed, etc.)
  3. Store the .sh and .scpt files in /extensions/powerpoint/.
  4. Write a readme.md file for /extensions/powerpoint/ that provides user friendly instructions for using the script and incorporating it into the make.sh workflow. Note that this should warn the user about the various permissions warnings that are likely to pop up.
  5. Ask another user in the lab to test that they can deploy the scripts based on the instructions to add a powerpoint file to /3_slides/ and have it compiled by make.sh.
ShiqiYang2022 commented 5 months ago

A housekeeping update here:

@shrishj has done really great work to address bullets per https://github.com/gentzkow/GentzkowLabTemplate/issues/5#issuecomment-2025628807, if there's any delay, that's totally my fault. Per my test, point 1, 3, 4 have been addressed decently.

There's an issue in 2: ideally, we do not want to have slides.pptx live under 3_slides/source directory. Instead, we would like to have this slides.pptx document live in extensions/powerpoint. Plus, we need to move all related code into extensions/powerpoint. We finally would like to let users simply move the extensions/powerpoint folder out and make it live in parallel with /3_slides/ so that it can serve as a independent submodule that can be run separately.

@shrishj Feel free to go ahead and address 2. After we converged on the remaining issue on 2, I am happy to circulate this to our labmates to ask for their peer review. Thanks!

shrishj commented 5 months ago

Hi @ShiqiYang2022. Thanks for your feedback! I will work on item 2 this weekend and revert back to you with some changes.

shrishj commented 5 months ago

Hi @ShiqiYang2022. I have completed the integration of the PPTX to PDF functionality and pushed the commit here. Thanks for all clarification offline. I believe it is is ready to be tested by other lab members. Please let me know in case you have any suggestions. Thanks!

ShiqiYang2022 commented 5 months ago

@shrishj Thanks for all great work! I think they overall looks good from my side except several small suggestions: Let's move run_pptx.sh into /lib/, run_pptx.scpt into /5_pptx/source, and README.md into /5_pptx/source. Then we adjust the path accordingly, and remove the /5_pptx/ layer. I think we would like to let the user rename /powerpoint/ to /5_pptx/ when they move the folder out of /extension/. After address this, I will loop others in.

Thanks!

shrishj commented 5 months ago

Hi @ShiqiYang2022. Thanks for the feedback. I have made the suggested changes and pushed a commit here. Please let me know if you would like me to make any other changes.

ShiqiYang2022 commented 5 months ago

@shrishj Thanks so much! It looks great from my side. I will then move to a PR to let other lab members test the functionality.

ShiqiYang2022 commented 5 months ago

Threads continues in its PR #8.

ShiqiYang2022 commented 4 months ago

Summary

In this issue we wrote the script to convert .pptx file into .pdf files. The deliverable is extension/powerpoint.

Merged into main in c324876. Final state of the issue branch is here.