Closed huispaty closed 1 year ago
I wrote the duplicate line check but that does not sound reasonable. Are you sure the issue is from the duplicate line checking? I checked without the duplicate line validation but it still takes a long time (p.s. I didn't wait 3hrs though)
Yes I'm sure the issue stems from that part. Without this check, the file takes ~27secs to load (as it is quite large). With the checks, it takes >3.5hrs. I noticed this mainly because I was working on different branches, one of which had not yet been merged into develop
(thus main
) and had a version without this duplicate line check functionality.
My proposed solution looks like this:
with open(filename) as f:
raw_lines = f.read().splitlines()
version = get_version(raw_lines[0])
from_matchline_methods = FROM_MATCHLINE_METHODSV1
if version < Version(1, 0, 0):
from_matchline_methods = FROM_MATCHLINE_METHODSV0
raw_lines = list(set(raw_lines))
parsed_lines = [
parse_matchline(line, from_matchline_methods, version) for line in raw_lines
]
parsed_lines = [pl for pl in parsed_lines if pl is not None]
mf = MatchFile(lines=parsed_lines)
Using this approach the same file takes ~25secs to load. Currently this is on my local branch only - it's not yet pushed as I would like to first address some raised issues that also relate to match file importing.
Parsing match files takes longer than in previous versions due to the duplicate lines check in
importmatch.py
. The attached test file (converted to .txt) takes > 3hrs to load. beethoven_op026_mv3.txt