Open elemenohpi opened 7 months ago
Sometimes faulty input files need pre-processing. This is a new challenge to think about. Do we need a LLM filter or will it significantly slow down the process? Perhaps we could use common approaches in data science for handling some of the inputs.
A classifier that determines whether or not the input files are already in a standard format that can quickly be converted to the target format. In case the files are standard, it passes the input to the search/replace (deterministic converter) module. Otherwise, it passes the data to the LLM converter.