atlassian-api / atlassian-python-api

Atlassian Python REST API wrapper
https://atlassian-python-api.readthedocs.io
Apache License 2.0
1.29k stars 642 forks source link

Empty files after confluence download #1304

Open git-aybo opened 5 months ago

git-aybo commented 5 months ago

Hi I use the atlassian-pythone-api 3.41.7 for downloading the attachments from a confluence page. The download works, but all files are empty.

confluence = Confluence( url="https://confluence.xyt.xx/", username=os.getenv("CONFLUENCE_USER", None), password=os.getenv("CONFLUENCE_PASSWORD", None) ) confluence.download_attachments_from_page(page_id=123, path='/tmp')

Thank you for your help. With regards Andreas

gkowalc commented 5 months ago

Could you provide more details? Which confluence version are you using (DC or cloud, if DC which build version, are you using any reverse proxy?). And what is your host on which you are trying to save the files (OS type, version). Also go to the directory where attachments should be saved. What's the size of each file? I just checked on Confluence cloud + macOS 14.2.1 and I haven't encountered any problems.

git-aybo commented 5 months ago

Hi gkowalc

Thank you for your reply.

Confluence: Version (DC or cloud) → Datacenter 7.19.18 Build Number → 8804 Build Date → 16.01.2024 Reverse Proxy: yes Host: redhat/ubi8:8.9-1107 and it runs as a docker container

Client: The python process runs on macOS 13.6.3 (22G436) under python 3.8.10.

On the Confluence page are 4 attachments (50 kB to 1.5 MB as png or pdf). The files are available in the download directory on the Mac, but with a size of 0 bytes

With regards Andreas

gkowalc commented 5 months ago

Thanks for the info @git-aybo I bootstrapped DC testing instance on AWS (ubuntu EC2 + load balancer/proxy) and I was able to get correct attachments size from downloaded sample j.pg attachment (±5mb) but not from.png and .txt attachments so your issue is (partially) reproducible. It is weird that I had no issue with .jpg file.

download_attachments_from_page function relies on following concept: download_link = self.url + attachment["_links"]["download"] r = self._session.get(f"{download_link}") file_path = os.path.join(path, file_name) with open(file_path, "wb") as f: f.write(r.content)

Above code calls _session object which is used differently between DC and cloud (DC uses token param to authenticate whereas cloud is username + password(. ` def _create_basic_session(self, username, password): self._session.auth = (username, password)

def _create_token_session(self, token):
    self._update_header("Authorization", "Bearer {token}".format(token=token))`

I am not sure if the issue is with reply from from the confluence DC API itself or with our wrapper code rest_client.py. I am going to look closer at this bug in the upcoming week and then we can make a decision if this feature should be marked as "cloud only" of if there is some reasonable fix for download_attachments_from_page function. Ideally I would like not to make any changes to rest_client.py file.

git-aybo commented 5 months ago

Hi gkowalc

Thank you for your work and your information. The behavior is the same for me. A JPG file can be downloaded successfully.

With regards Andreas

gkowalc commented 5 months ago

I'm experiencing some confusion at the moment. A few days ago, while trying to replicate a problem, I managed to successfully download jpeg files but encountered difficulties with .png and .pdf files. Today, I created a new testing instance on AWS. This setup mirrors the previous one, using version 7.19.18 on an Ubuntu EC2, with the same versions of Python and the required library. All attachments appear to be of the correct size. So could it be the problem is caused by some weird caching issue?

Can anyone attempt to reproduce the download_attachments_from_page on version 7.19.18 or the latest 8.5?

from atlassian import Confluence
import os

confluence_DC =Confluence(
url='confl_server_url',
token='<api_token>'
)
def download_attachments_test(api_wrapper_object, page_id, directory_path):
    api_wrapper_object.download_attachments_from_page(page_id=page_id, path=directory_path)
def check_file_size(directory):
    for filename in os.listdir(directory):
        if os.path.isfile(os.path.join(directory, filename)):
            print(f'File: {filename}, Size: {os.path.getsize(os.path.join(directory, filename))} bytes')

download_attachments_test(confluence_DC, dc_page, path_dc)
check_file_size(path_dc)