Closed haesleinhuepf closed 2 weeks ago
This message was generated by git-bob (version: 0.9.0, model: github_models:gpt-4o, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.
To download the first file from a Zenodo record and convert it to a PNG if it's a PPT or PDF:
import requests
from pptx import Presentation
from pdf2image import convert_from_path
from io import BytesIO
from PIL import Image
def download_first_file_from_zenodo(record_id):
# Fetch record metadata
url = f"https://zenodo.org/api/records/{record_id}"
response = requests.get(url)
response.raise_for_status()
data = response.json()
# Get the first file download link
file_url = data['files'][0]['links']['self']
file_type = data['files'][0]['type']
# Download the file
response = requests.get(file_url)
response.raise_for_status()
file_content = BytesIO(response.content)
# Check if the file is PPT or PDF and convert
if file_type == 'application/vnd.openxmlformats-officedocument.presentationml.presentation':
prs = Presentation(file_content)
slide = prs.slides[0]
image = BytesIO()
slide.shapes[0].image.export(image, 'PNG')
image.seek(0)
img = Image.open(image)
elif file_type == 'application/pdf':
pages = convert_from_path(file_content)
img = pages[0]
# Save the first page or slide as a PNG
img.save('first_page.png', 'PNG')
# Example usage
download_first_file_from_zenodo('1234567')
Replace '1234567'
with the actual Zenodo record ID.
Hi @marabuuu ,
as discussed yesterday, could you please check if this works or can be made to work? (-:
Edit: Feel free to let git-bob do the hard part of the job :-)
Thanks!
Best, Robert
Yes, I am happy to look into this :)
It would be great if we had a script that determines the recently most downloaded zenodo records, similar to this notebook: and then updates the website main page. For example, it could replace a placeholder such as
{most_downloaded}
with some actual content.Optional: It could download the most downloaded thing, turn it into a PNG and show it on the main page. This will not work with all contents obviously, but with PPTx or PDFs.
git-bob comment: Given a zenodo record url, how would you automatically download the first file from this zenodo record using the zenodo API and if its a PPT or PDF turn the first slide or page into a PNG?