This PR significantly reduces algorithm complexity by using sets instead of lists, improving performance enormously. I tried executing it on ~1 mil files and had to ^C it after about 30 minutes. A quick CProfile run later it was clear that the horrible performance was due to lists being used instead of sets. With these changes applied, it now completes in less than 15 seconds. Two tests are failing, but this is also the case with the current master.
Edit: Oh, and I also replaced the for loop with "found" variable with a more idiomatic for/else construct.
This PR significantly reduces algorithm complexity by using sets instead of lists, improving performance enormously. I tried executing it on ~1 mil files and had to ^C it after about 30 minutes. A quick CProfile run later it was clear that the horrible performance was due to lists being used instead of sets. With these changes applied, it now completes in less than 15 seconds. Two tests are failing, but this is also the case with the current master.
Edit: Oh, and I also replaced the for loop with "found" variable with a more idiomatic for/else construct.