Open whikloj opened 7 years ago
@whikloj the static cache leaks are likely the cause of a long-standing "problem" I've had with PHPUnit, where one test will interfere with another despite teardown between tests. To get around this, I have added 'X' to the start of some test files, which seemed to work, for some reason. Thanks for the tip on --process-isolation
. I'm going to try to rename those 'X' tests to see if that works.
I'm not sure how else to control the static caches in the code, but I can say that they do speed up things considerably in production.
Bingo. Thank you very much! I owe you a 🍺. This has been plaguing me for a long time.
I'm going to open a PR to rename those test files, unless you think there's an underlying problem with the static caches.
@whikloj we use static caches in fetchers and filegetters to avoid re-parsing or re-retrieving source data (in the case of fetchers) or re-recursing deep directory structures (in the case of filegetters). Avoiding those expensive tasks is why the static caches speed up MIK substantially when using some toolchains.
@mjordan no I think you should have static caches, but if you add a method to clear or force regeneration (maybe I missed it) then in the tests we could call that to ensure each test uses a fresh cache. According to what I read this occurs because the tests share the PHP instance so the static is (in effect) working properly, except that in the phpunit context it causes problems.
On 28 Jun 2017 6:52 p.m., "Mark Jordan" notifications@github.com wrote:
@whikloj https://github.com/whikloj we use static caches in fetchers and filegetters to avoid re-parsing or re-retrieving source data (in the case of fetchers) or re-recursing deep directory structures (in the case of filegetters). Avoiding those expensive tasks is why the static caches speed up MIK substantially when using some toolchains.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MarcusBarnes/mik/issues/417#issuecomment-311823675, or mute the thread https://github.com/notifications/unsubscribe-auth/ACua4XBEdLgiubY5w0bCgJPMUcRGEjUSks5sIudFgaJpZM4OIg3m .
@whikloj we have a configuration setting, [FETCHER] use_cache
, that is documented in the CSV toolchains:
use_cache: Optional. Set to false in automated tests (in other words, you will not need to use this unless you are writing automated tests for this fetcher).
You can see it in use at https://github.com/MarcusBarnes/mik/blob/master/tests/CsvSingleFileToolchainTest.php#L28 for example. The cache this applies to is the cached metadata that fetchers create to reduce the number of times they need to hit CONTENTdm or parse a CSV file, etc.
We could add logic to the various instances of static caches so that if this setting was false
the static caching gets bypassed.
I'd love to resolve this before we look at the other issues and PRs, since like I said above, the bleeding has been chronic for some time and a real PITA. It's awesome that you discovered the cause, and at least one solution.
@mjordan So this use_cache
is currently documented for CSV, but not on the other fetchers? Do you want to add a protected $use_cache
to the Fetcher
base class. We would still have to document to use it in individual implementations like csv and cdm and Excel....
In the mean time I'll add 'use_cache' => false
to my version of the tests and work on adding it to the others to.
@mjordan Okay I have this working with my updated namespaced PHPUnit tests. So I no longer need to use --process-isolation
.
> phpunit
PHPUnit 6.2.2 by Sebastian Bergmann and contributors.
============================================================================================> 100%
.Filtering 20 records through the RandomSet fetcher manipulator.
============================================================================================> 100%
.Filtering 20 records through the RangeSet fetcher manipulator.
============================================================================================> 100%
.Filtering 20 records through the RangeSet fetcher manipulator.
============================================================================================> 100%
.Filtering 20 records through the RangeSet fetcher manipulator.
============================================================================================> 100%
.Filtering 20 records through the RangeSet fetcher manipulator.
============================================================================================> 100%
.Filtering 20 records through the CsvSingleFileByExtension fetcher manipulator.
============================================================================================> 100%
.Filtering 20 records through the CsvSingleFileByFilename fetcher manipulator.
...............................................S.... 62 / 62 (100%)
Time: 4.44 seconds, Memory: 18.00MB
OK, but incomplete, skipped, or risky tests!
Tests: 62, Assertions: 97, Skipped: 1.
Generating code coverage report in Clover XML format ... done
Do you want me to recreate my changes in the existing non-namespaced PHPUnit tests? What versions of PHP are you and @MarcusBarnes supporting? Because the current matrix works with namespaces. But if you want to add 5.4, then I think you start running into trouble.
@whikloj sorry, use_cache
also applies (as you've discovered) and is documented for all fetchers, except for the new Filesystem fetcher where we probably never want to cache the contents of a directory. We currently require a minimum PHP version of 5.5 but as we were discussing in #415 we could probably move that up to 5.6 to support PHPUnit 5.7. We wouldn't want to go much higher than that as a minimum version of PHP.
I'm sorry, I'm not really qualified (I think) to answer your question about recreating your changes in the existing non-namespaced PHPUnit tests, I haven't been able to keep up with the firehose of recent activity. You're a machine!
Even if I'm not qualified, I will say yes if it is relatively easy to do so. We really appreciate your work!
Based on your comments in #277 I'm assuming that this can be halted and rolled in as part of that work. In which case I won't recreate that work and instead you'll move to use the namespaced versions of the PHPUnit classes.
But if you or @MarcusBarnes are unsure or would like to sit down and have a Skype chat I am certainly open to that. Sorry if I'm drenching you guys with code review, I'm sure this is a useful tool but you move on to other needs and work and now I am coming in with a use case and so I have time to burn on it. I can slow down (and probably will).
@whikloj yes to halting in favor of #277, totally.
A skype call would be fabulous. I'm off today (Monday) but let's the three of us catch up early next week?
Sounds good, I am also off today. But otherwise I am pretty flexible with my time.
On 3 Jul 2017 11:02 a.m., "Mark Jordan" notifications@github.com wrote:
@whikloj https://github.com/whikloj yes, totally.
A skype call would be fabulous. I'm off today (Monday) but let's the three of us catch up early next week?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MarcusBarnes/mik/issues/417#issuecomment-312683449, or mute the thread https://github.com/notifications/unsubscribe-auth/ACua4Ysew_LzEpn-9v71EC35eSTT5xGJks5sKRCwgaJpZM4OIg3m .
This is more of a testing issue but I have found that due to static caches like (https://github.com/MarcusBarnes/mik/blob/master/src/fetchers/Csv.php#L81) the PHPUnit tests are not starting cleanly. It might be useful to have a way to clear the static cache on instantiation or something.
For example:
If I enable process isolation each test uses a separate PHP process and this issue is removed, but that doesn't necessarily mean the code is good. Statics have always been confusing to me and this is a confusing one.
Just something to consider.