Unstructured-IO / unstructured-api-tools

Apache License 2.0
28 stars 10 forks source link

CSV output formatting fix #176

Closed rbiseck3 closed 1 year ago

rbiseck3 commented 1 year ago

If output type is text/csv, wrap in PlainTextResponse to maintain formatting

rbiseck3 commented 1 year ago

Not sure what the best response formatting would be when there are multiple files that need csv outputs. Right now it still defaults to the application/json response type which wraps the strings into a json array.

rbiseck3 commented 1 year ago

For multi file outputs, the csv content is joined with the others using an outer join so if there were different columns, they would all be persisted.

rbiseck3 commented 1 year ago

Since the code in the prepline_test_project directory is autogenerated, I felt it didn't make sense to require a code coverage percentage here since it makes sense that there might be more code generated than needed for every case of wrapped code from a notebook. That directory was removed from the make target that runs pytest.

awalker4 commented 1 year ago

I'm getting an error with make tidy-notebooks, which is going to execute all the cells. Is there a missing pandas import somewhere?

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 1
----> 1 print(pipeline_api("some text", "text/csv", "isd"))

Cell In[1], line 7, in pipeline_api(text, response_type, response_schema)
      2 def pipeline_api(
      3     text,
      4     response_type="text/csv",
      5     response_schema="isd",
      6 ):
----> 7     data = pd.DataFrame(data={"silly_result": [str(len(text)), text, str(response_type), str(response_schema)]})
      8     if response_type == "text/csv":
      9         return data.to_csv()

NameError: name 'pd' is not defined
NameError: name 'pd' is not defined