MarcusBarnes / mik

The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
GNU General Public License v3.0
34 stars 11 forks source link

Static caching bleeding in tests #417

Open whikloj opened 7 years ago

whikloj commented 7 years ago

This is more of a testing issue but I have found that due to static caches like (https://github.com/MarcusBarnes/mik/blob/master/src/fetchers/Csv.php#L81) the PHPUnit tests are not starting cleanly. It might be useful to have a way to clear the static cache on instantiation or something.

For example:

> phpunit --verbose                    
PHPUnit 6.2.2 by Sebastian Bergmann and contributors.

Runtime:       PHP 7.1.5 with Xdebug 2.5.1
Configuration: /sw/var/www/DAM/mik/phpunit.xml.dist

====================================================================================================> 100%
.Filtering 20 records through the RandomSet fetcher manipulator.
====================================================================================================> 100%
.Filtering 20 records through the RangeSet fetcher manipulator.
====================================================================================================> 100%
.Filtering 20 records through the RangeSet fetcher manipulator.
====================================================================================================> 100%
.Filtering 20 records through the RangeSet fetcher manipulator.
====================================================================================================> 100%
.Filtering 20 records through the RangeSet fetcher manipulator.
====================================================================================================> 100%
.Filtering 20 records through the CsvSingleFileByExtension fetcher manipulator.
====================================================================================================> 100%
.Filtering 20 records through the CsvSingleFileByFilename fetcher manipulator.
.FFE..............................F...F                 49 / 49 (100%)

Time: 2.05 seconds, Memory: 16.00MB

There was 1 error:

1) mik\fetchers\CsvFetcherTest::testGetItemInfo
Trying to get property of non-object

/sw/var/www/DAM/mik/tests/fetchers/CsvFetcherTest.php:78

--

There were 4 failures:

1) mik\fetchers\CsvFetcherTest::testGetRecords
Failed asserting that actual size 10 matches expected size 20.

/sw/var/www/DAM/mik/tests/fetchers/CsvFetcherTest.php:56

2) mik\fetchers\CsvFetcherTest::testGetNumRecs
Failed asserting that 10 matches expected 20.

/sw/var/www/DAM/mik/tests/fetchers/CsvFetcherTest.php:67

3) mik\tests\toolchain\CsvBookToolchainTest::testWritePackages
OBJ.tif file was not written by CsvBooks toolchain using a configured page sequence separator.
Failed asserting that file "/var/folders/w3/9tw0brpn0xg9gr2q6ct9zty80000gp/T/mik_csv_book_output_dir/B1/4/OBJ.tif" exists.

/sw/var/www/DAM/mik/tests/toolchain/XCsvBookToolchainTest.php:192

4) mik\tests\toolchain\CsvCompoundToolchainTest::testWritePackages
CSV to MODS metadata parser did not work
Failed asserting that '<?xml version="1.0"?>\n
<mods xmlns="http://www.loc.gov/mods/v3" xmlns:mods="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">\n
  <titleInfo>\n
    <title>Second compound object, part 2</title>\n
  </titleInfo>\n
</mods>\n
' contains "  <titleInfo>
    <title>I am the second compound object's second child</title>
  </titleInfo>".

/sw/var/www/DAM/mik/tests/toolchain/XCsvCompoundToolchainTest.php:209

ERRORS!
Tests: 49, Assertions: 76, Errors: 1, Failures: 4.

If I enable process isolation each test uses a separate PHP process and this issue is removed, but that doesn't necessarily mean the code is good. Statics have always been confusing to me and this is a confusing one.

> phpunit --verbose --process-isolation
PHPUnit 6.2.2 by Sebastian Bergmann and contributors.

Runtime:       PHP 7.1.5 with Xdebug 2.5.1
Configuration: /sw/var/www/DAM/mik/phpunit.xml.dist

======================================================================> 100%
Filtering 20 records through the RandomSet fetcher manipulator.
======================================================================> 100%
Filtering 20 records through the RangeSet fetcher manipulator.
======================================================================> 100%
Filtering 20 records through the RangeSet fetcher manipulator.
======================================================================> 100%
Filtering 20 records through the RangeSet fetcher manipulator.
======================================================================> 100%
Filtering 20 records through the RangeSet fetcher manipulator.
======================================================================> 100%
Filtering 20 records through the CsvSingleFileByExtension fetcher manipulator.
======================================================================> 100%
Filtering 20 records through the CsvSingleFileByFilename fetcher manipulator.
......................................F                 49 / 49 (100%)

Time: 7.46 seconds, Memory: 12.00MB

There was 1 failure:

1) mik\tests\toolchain\CsvCompoundToolchainTest::testWritePackages
CSV to MODS metadata parser did not work
Failed asserting that '<?xml version="1.0"?>\n
<mods xmlns="http://www.loc.gov/mods/v3" xmlns:mods="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">\n
  <titleInfo>\n
    <title>Second compound object, part 2</title>\n
  </titleInfo>\n
</mods>\n
' contains "  <titleInfo>
    <title>I am the second compound object's second child</title>
  </titleInfo>".

/sw/var/www/DAM/mik/tests/toolchain/XCsvCompoundToolchainTest.php:209

FAILURES!
Tests: 49, Assertions: 77, Failures: 1.

Just something to consider.

mjordan commented 7 years ago

@whikloj the static cache leaks are likely the cause of a long-standing "problem" I've had with PHPUnit, where one test will interfere with another despite teardown between tests. To get around this, I have added 'X' to the start of some test files, which seemed to work, for some reason. Thanks for the tip on --process-isolation. I'm going to try to rename those 'X' tests to see if that works.

I'm not sure how else to control the static caches in the code, but I can say that they do speed up things considerably in production.

mjordan commented 7 years ago

Bingo. Thank you very much! I owe you a 🍺. This has been plaguing me for a long time.

I'm going to open a PR to rename those test files, unless you think there's an underlying problem with the static caches.

mjordan commented 7 years ago

@whikloj we use static caches in fetchers and filegetters to avoid re-parsing or re-retrieving source data (in the case of fetchers) or re-recursing deep directory structures (in the case of filegetters). Avoiding those expensive tasks is why the static caches speed up MIK substantially when using some toolchains.

whikloj commented 7 years ago

@mjordan no I think you should have static caches, but if you add a method to clear or force regeneration (maybe I missed it) then in the tests we could call that to ensure each test uses a fresh cache. According to what I read this occurs because the tests share the PHP instance so the static is (in effect) working properly, except that in the phpunit context it causes problems.

On 28 Jun 2017 6:52 p.m., "Mark Jordan" notifications@github.com wrote:

@whikloj https://github.com/whikloj we use static caches in fetchers and filegetters to avoid re-parsing or re-retrieving source data (in the case of fetchers) or re-recursing deep directory structures (in the case of filegetters). Avoiding those expensive tasks is why the static caches speed up MIK substantially when using some toolchains.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MarcusBarnes/mik/issues/417#issuecomment-311823675, or mute the thread https://github.com/notifications/unsubscribe-auth/ACua4XBEdLgiubY5w0bCgJPMUcRGEjUSks5sIudFgaJpZM4OIg3m .

mjordan commented 7 years ago

@whikloj we have a configuration setting, [FETCHER] use_cache, that is documented in the CSV toolchains:

use_cache: Optional. Set to false in automated tests (in other words, you will not need to use this unless you are writing automated tests for this fetcher).

You can see it in use at https://github.com/MarcusBarnes/mik/blob/master/tests/CsvSingleFileToolchainTest.php#L28 for example. The cache this applies to is the cached metadata that fetchers create to reduce the number of times they need to hit CONTENTdm or parse a CSV file, etc.

We could add logic to the various instances of static caches so that if this setting was false the static caching gets bypassed.

mjordan commented 7 years ago

I'd love to resolve this before we look at the other issues and PRs, since like I said above, the bleeding has been chronic for some time and a real PITA. It's awesome that you discovered the cause, and at least one solution.

whikloj commented 7 years ago

@mjordan So this use_cache is currently documented for CSV, but not on the other fetchers? Do you want to add a protected $use_cache to the Fetcher base class. We would still have to document to use it in individual implementations like csv and cdm and Excel....

whikloj commented 7 years ago

In the mean time I'll add 'use_cache' => false to my version of the tests and work on adding it to the others to.

whikloj commented 7 years ago

@mjordan Okay I have this working with my updated namespaced PHPUnit tests. So I no longer need to use --process-isolation.

> phpunit
PHPUnit 6.2.2 by Sebastian Bergmann and contributors.

============================================================================================> 100%
.Filtering 20 records through the RandomSet fetcher manipulator.
============================================================================================> 100%
.Filtering 20 records through the RangeSet fetcher manipulator.
============================================================================================> 100%
.Filtering 20 records through the RangeSet fetcher manipulator.
============================================================================================> 100%
.Filtering 20 records through the RangeSet fetcher manipulator.
============================================================================================> 100%
.Filtering 20 records through the RangeSet fetcher manipulator.
============================================================================================> 100%
.Filtering 20 records through the CsvSingleFileByExtension fetcher manipulator.
============================================================================================> 100%
.Filtering 20 records through the CsvSingleFileByFilename fetcher manipulator.
...............................................S....    62 / 62 (100%)

Time: 4.44 seconds, Memory: 18.00MB

OK, but incomplete, skipped, or risky tests!
Tests: 62, Assertions: 97, Skipped: 1.

Generating code coverage report in Clover XML format ... done

Do you want me to recreate my changes in the existing non-namespaced PHPUnit tests? What versions of PHP are you and @MarcusBarnes supporting? Because the current matrix works with namespaces. But if you want to add 5.4, then I think you start running into trouble.

mjordan commented 7 years ago

@whikloj sorry, use_cache also applies (as you've discovered) and is documented for all fetchers, except for the new Filesystem fetcher where we probably never want to cache the contents of a directory. We currently require a minimum PHP version of 5.5 but as we were discussing in #415 we could probably move that up to 5.6 to support PHPUnit 5.7. We wouldn't want to go much higher than that as a minimum version of PHP.

I'm sorry, I'm not really qualified (I think) to answer your question about recreating your changes in the existing non-namespaced PHPUnit tests, I haven't been able to keep up with the firehose of recent activity. You're a machine!

mjordan commented 7 years ago

Even if I'm not qualified, I will say yes if it is relatively easy to do so. We really appreciate your work!

whikloj commented 7 years ago

Based on your comments in #277 I'm assuming that this can be halted and rolled in as part of that work. In which case I won't recreate that work and instead you'll move to use the namespaced versions of the PHPUnit classes.

But if you or @MarcusBarnes are unsure or would like to sit down and have a Skype chat I am certainly open to that. Sorry if I'm drenching you guys with code review, I'm sure this is a useful tool but you move on to other needs and work and now I am coming in with a use case and so I have time to burn on it. I can slow down (and probably will).

mjordan commented 7 years ago

@whikloj yes to halting in favor of #277, totally.

A skype call would be fabulous. I'm off today (Monday) but let's the three of us catch up early next week?

whikloj commented 7 years ago

Sounds good, I am also off today. But otherwise I am pretty flexible with my time.

On 3 Jul 2017 11:02 a.m., "Mark Jordan" notifications@github.com wrote:

@whikloj https://github.com/whikloj yes, totally.

A skype call would be fabulous. I'm off today (Monday) but let's the three of us catch up early next week?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MarcusBarnes/mik/issues/417#issuecomment-312683449, or mute the thread https://github.com/notifications/unsubscribe-auth/ACua4Ysew_LzEpn-9v71EC35eSTT5xGJks5sKRCwgaJpZM4OIg3m .