Unstructured-IO / unstructured-api

Apache License 2.0
429 stars 94 forks source link

API refactor proposition #417

Open hubert-rutkowski85 opened 1 month ago

hubert-rutkowski85 commented 1 month ago

It aims to make the code structure easier to understand, navigate and modify, by:

  1. shortening long files - general.py had 849 lines, now 553.

  2. by moving common functionality to separate files - endpoints.py, logging.py, validation.py, memory_protection.py

In both cases, it still could be improved (general.py shorteneded by about 300 lines, and functions moved to new files like pdf_splits.py, pipeline.py) , but had some problems with monkeypatching expecting functions in certain places, and didn't want to spend more time on it.

No other changes than moving functions betweeen files were made in this PR.

hubert-rutkowski85 commented 1 month ago

i'd feel better about a large refactor change like thais after we had a multi-thousand doc release candidate test in place, since basically right now, prod is inadvertently serving as our QA env as well. not saying don't do it, but need to be super cautious.

Yes I agree, having tests when making refactors is crucial. Here the situation is simpler, as the only changes are moving functions between files. So no changes in logic. All existing tests pass.