Our citation code assumes PNGs are based on PDFs

If you upload a PNG, which can be OCRed fine with the new Document Intelligence, and then ask a question on it, you'll see this error:

Traceback (most recent call last):
  File "/workspaces/azure-search-openai-demo/app/backend/app.py", line 180, in format_as_ndjson
    async for event in r:
  File "/workspaces/azure-search-openai-demo/app/backend/approaches/chatapproach.py", line 152, in run_with_streaming
    extra_info, chat_coroutine = await self.run_until_final_call(
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/azure-search-openai-demo/app/backend/approaches/chatreadretrieveread.py", line 168, in run_until_final_call
    sources_content = self.get_sources_content(results, use_semantic_captions, use_image_citation=False)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/azure-search-openai-demo/app/backend/approaches/approach.py", line 201, in get_sources_content
    return [
           ^
  File "/workspaces/azure-search-openai-demo/app/backend/approaches/approach.py", line 202, in <listcomp>
    (self.get_citation((doc.sourcepage or ""), use_image_citation)) + ": " + nonewlines(doc.content or "")
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/azure-search-openai-demo/app/backend/approaches/approach.py", line 213, in get_citation
    page_number = int(path[page_idx + 1 :])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'pane'

That's due to this code:

def get_citation(self, sourcepage: str, use_image_citation: bool) -> str:
    if use_image_citation:
        return sourcepage
    else:
        path, ext = os.path.splitext(sourcepage)
        if ext.lower() == ".png":
            page_idx = path.rfind("-")
            page_number = int(path[page_idx + 1 :])
            return f"{path[:page_idx]}.pdf#page={page_number}"

        return sourcepage

That made sense when we only supported PDFs and all PNGs were PNGified versions of PDFs, but now is not compatible with someone who just wants to plain upload PNGs.

The solution might be to pass in sourcefile as well, as I think that might still be PDF in the case of vision? Needs some experimentation.

Azure-Samples / azure-search-openai-demo

Our citation code assumes PNGs are based on PDFs #1539