Refactoring parsers - Githubissues

simonbowly commented 2 years ago

@maliheha and I are working on this on my fork. These updates aims to separate the parsers for different sections of the log into submodules so things are easier to unit-test and modify as we go.

@mattmilten no need to review for now, we'll let you know when it's in a more complete state.

@ronaldvdv we noticed you're also adding some tests and example data - can we fold this into the refactored code? Just want to make sure we aren't duplicating work and writing tests that may eventually clash with one another. Happy to discuss and coordinate together.

simonbowly commented 2 years ago

@mattmilten we are pretty close to done with refactoring in this PR :) There are some very minor behaviour changes which we've documented in the changelog. To try to avoid regressions, the previous code is still in the repo, and tests/test_regression.py tests the refactored code against the v1.3.2 code (we should delete both of these before finalising).

When you have a chance to review could you let us know your thoughts? Probably a few things could still be tidied up but we're quite happy with the structure.

Thanks @maliheha for your great work on this so far, and for putting the below summary of the new structure:

In the new design, parsers are the most inner layer, each responsible for parsing a specific section in the log file.
The parsers can be found in src/grblogtools/parsers/. Each parser has a main method parse() that takes a single line as an input and returns True/False in case the line is matched by a pattern associated with the parser.
Each parser has two other methods named get_summary() and get_progress() (if applicable) to return the parsed summary or the detailed progress.
The next layer is the SingleLogParser which is responsible for parsing one single log run.
The SingleLogParser(see src/grblogtools/parsers/single_log.py) has the main method parse() which takes in a line as an input and returns True/False if the line is matched by some patterns. The SingleLogParser has two internal variables: current_parser and future_parsers which keeps them updated as it sees more lines. It initializes by setting the current_parser to the header_parser and the future_parsers to the list of remaining parsers. If a line is not matched by the current_parser, the SingleLogParser checks whether it should pass the parser to any of the remaining parsers or not. As soon as any of the parsers in the list of future_parsers returns True, the current_parser and the future_parsers are updated.
The final layer (API layer) (see src/grblogtools/api.py) has also a method named parse() where it takes in strings of log file patterns as individual arguments and returns a ParseResult object. The methods summary() and progress() of the ParseResult class returns the summary dict and search progress information as before.
The main method of the ParseResult class is parse() that takes in the path to a single log file (the log file can include multiple runs) and uses SingleLogParser objects to parse each log run.
The API also includes a legacy function named get_dataframe(logfiles, timelines=False, prettyparams=False). i.e. the API is unchanged other than dropping merged_logs (these are handled automagically)
The usage example of the API can be found in the header of the api.py module.
Unit tests can be found in tests/ where separate tests are written for individual parsers and api.
There is a file named test_refactor_regression.py including regression tests to ensure the equivalent outputs from the current grblogtools api and the newly designed api.
The tests run against current test data in data folder and newly added test data in tests/assets.
The plotting API is left untouched and there is no plan to re-factor it under this project.
The extension of the current API to tuner, multiple objective optimization, concurrent and distributed optimization are the next steps, which can be directly incorporated into the main grblogtools git repo after the current base code is reviewed and hopefully merged.

simonbowly commented 2 years ago

Excellent work! One question: would it make sense to define a base class for the different parsers? They all implement the same function anyway but the classes do not seem to be connected by a common base class.

Thanks! I don't think abstract base classes really add much here. There's a common API between classes but the abstract class doesn't add any functionality, so I'm inclined to just leave it as is.

simonbowly commented 2 years ago

@mattmilten this is ready to go!

Gurobi / gurobi-logtools

Refactoring parsers #22