closup / process-xbrl

3 stars 1 forks source link

Fix HTML download #71

Open kwheelan opened 1 month ago

kwheelan commented 1 month ago

Change html download to a zip file with images included. (Right now, the images don't render in the download because the path doesn't exist.)

lucakato commented 2 weeks ago

@kwheelan Pls also check this one! not sure if what I did is what you meant by downloading zip with images included. Not sure if the images downloaded as zip rn are the images you're expecting.

kwheelan commented 2 weeks ago

@lucakato This is a good start! the zip works for me. But the goal of zipping the files is to have the images render correctly. Right now all the 'src' attributes for the 'img' tags point to images in the sessions folders (which we delete), so when you open the html file from the zip it still doesn't render with the images. You'll want to add some code to edit the 'src' attributes in the downloaded html to point to the correct relative filepaths (which should just be something like bfaace08-8b05-4024-b1f4-0243837f64c5/input/img/d5074d14-a2f2-4edd-8173-be3d3d9e83a9.png instead of /static/sessions_data/bfaace08-8b05-4024-b1f4-0243837f64c5/input/img/d5074d14-a2f2-4edd-8173-be3d3d9e83a9.png)

lucakato commented 1 week ago

@kwheelan does correcting the file path make the generation work? I fixed the path as you suggested on the #download-zip branch. I don't think it's working though?

image

I basically added the following code to modify the img tags

def modify_img_paths(session_id):
    # Define the path to the HTML file
    html_file_path = os.path.join('app', 'static', 'sessions_data', session_id, 'output', 'output.html')

    # Read the HTML file
    with open(html_file_path, 'r', encoding='utf-8') as file:
        soup = BeautifulSoup(file, 'html.parser')

    # Find all <img> tags and modify their src attribute
    for img_tag in soup.find_all('img'):
        src = img_tag.get('src')
        if src and src.startswith('/static/sessions_data/'):
            # Truncate '/static/sessions_data' from the src attribute
            new_src = src.replace('/static/sessions_data', '')
            img_tag['src'] = new_src

    # Save the modified HTML back to the file
    with open(html_file_path, 'w', encoding='utf-8') as file:
        file.write(str(soup))
lucakato commented 1 week ago

update: think I got it to work! image worked when I tested it

kwheelan commented 2 days ago

Great! I can test it myself by the end of the week and merge the branch.