Closed phfontes closed 10 months ago
Hello, any news?
Sorry, not sure what you mean, are you talking about the file format for data ingestion? Or some other file format? We support PDF so am not sure what you mean by HTML.
Hello Thank you for your feedback, sorry, in this case I mean, what is the best file format for the platform to have more assertive answers. Currently, I'm using HTML and I see better results. However, I still have doubts about the best format for more assertive answers.
Are you saying that you are ingesting HTML files? I'm not sure how that's possible, since prepdocs.py is set up only for ingesting PDF files. Did you first convert them to PDF? Please let us know how you've been working with HTML files.
Hello, That's right, I'm creating a structure with html and sections, when there are topics, I use ul and li. I believed it could accept other formats, it must have been confusing, because in the data_utils file there is a part that says about file_format_dict and there is html. The structure I'm using is < html> < head> < /head> < body> < section> < h1>Title of topic < ul> < li>Microsoft < li>Azure < /ul>
< /section>
< /body> < /html>
Hello, I would like to also know whether HTML files are supported. Any update please?
@phfontes I think you're referring to a different codebase, perhaps https://github.com/microsoft/sample-app-aoai-chatGPT ?
This repo doesn't directly support HTML so you must convert to PDF first. I have a script that does that here: https://github.com/pamelafox/html-to-pdf-converter/blob/main/main.py
This issue is for a: (mark with an
x
)Minimal steps to reproduce
Any log messages given by the failure
Expected/desired behavior
OS and Version?
azd version?
Mention any other details that might be useful
Hello, I would like to ask a question. I'm doing some testing with html and I see that there is better performance in some situations. Of the tests/feedback they're having, what's the best file format?