idpaterson commented 8 years ago

As originally discussed in #196, this pull request begins the conversion from unittest2 to py.test. py.test was already used to run the tests, but as an alternative to unittest2 it provides an opportunity for intriguing code-level instrumentation to make tests easier to write and more comprehensive.

The following tests have been ported so far:

[x] TestAlternativeAbbreviations - this is a weird one since the test actually overrides the locale to use different abbreviations, so I may not have understood why this exists and exactly what was being tested vs. the normal abbreviations in the locale
[ ] TestAustralianLocale
[x] TestComplexDateTimes
[x] TestContext - note that pdtContext will be tested on almost every other test since it is included in the assertions and specified in the test data
[x] TestConvertUnitsAsWords - completed as an example of targeting numbers rather than dates or deltas
[x] TestDayStartHour
[x] TestDelta
[x] TestErrors - these will be implemented throughout as test groups with the invalid_ prefix
[ ] TestFrenchLocale
[ ] TestGermanLocale - there are a few tests in this PR but only to make sure the load tests from every locale logic was working
[x] TestInc
[ ] TestLocaleBase - oops this file is actually testing French
[x] TestMultiple
[x] TestNlp
[x] TestPhrases
[x] TestRanges
[ ] TestRussianLocale
[x] TestSimpleDateTimes
[x] TestSimpleOffsets
[x] TestSimpleOffsetsHours
[x] TestSimpleOffsetsNoon
[x] TestStartTimeFromSourceTime
[x] TestUnits

Additional tests needed for the test utilities:

[x] @pdtFixture
[x] datedelta
[x] dateReplacement
[x] nlpTarget
[x] nlpTargetValue
[ ] TestGroup
[x] TestCase
[x] YAML constructors
Scope of this PR

This pull request can be merged once tests have been rewritten to cover all of the current test cases. At that time there will still be a lot of work to do to improve and add tests especially in non-English locales; merging to v3 will allow pull requests for community contributions.

There will be no changes to the functionality of parsedatetime in this pull request other than instrumentation necessary for testing. For example, a "wildcard" flag was added to pdtContext to support tests that do not specify an explicit context.

Additional testing support

Testing against "today" and edge case dates

Most tests in pdt 2 were based on the current date as sourceTime rather than a predetermined time. That can be good and bad – I think it made the tests more confusing to reason about but it also helped to catch edge cases related to leap years, end of year, end of month, etc. I'm looking at you, #155! 99% of the time, "today" is just a normal time on a normal day and there is no guarantee that CI would happen to run on Feb 29. If it does and an error comes up, it's already too late to push out a fix.

For tests where the sourceTime is truly arbitrary (absolute dates and anything that can be expressed with deltas come to mind) it would be interesting to parametrize the function with a few edge case sourceTimes. Run it against Feb 29, Dec 31, Jan 1, today, etc so that those cases are caught intentionally rather than accidentally.

Testing NLP may require a list of targets

Some changes will need to be made to allow multiple target dates in test data.

Notes

This is a collaborative pull request

Any parsedatetime maintainers can commit on this PR and everyone is welcome to the discussion. I am also going to continue working on the test cases.

The old tests have been deleted from this branch

I deleted the old tests to avoid further housekeeping later; anyone who would graciously contribute to this pull request will need to use an alternate copy of the old tests for comparison.

The new test names and file structure does not necessarily match the old tests

Some previously separate test classes will be combined. For example, the cases for TestMultiple and TestDelta are both in deltas.yaml because the "multiples" were just multi-unit deltas. I used comments to separate them into sections within the data file.

Documentation

I will start a new issue for this but since the first bit of code is now public I wanted to express my preference to move away from epydoc. I have always found the HTML documentation very difficult to use... the content is well-written but the format is cumbersome and outdated. Mike, are you open to hosting documentation on ReadTheDocs or as a GitHub page via the gh_pages branch so that updates can go out automatically?

The test modules in this PR are documented for sphinx with Google style docstrings. The ReadTheDocs theme for sphinx makes for (in my opinion) a better presentation and a simpler, less syntactically-verbose code style. Some of the documentation that I have currently written in the Python docstrings would be better managed in separate rst files, but for now they should be helpful to anyone contributing to the tests. The implementation of that will arrive in a different pull request.

This change is

codecov-io commented 8 years ago

Codecov Report

Merging #199 into v3.0 will decrease coverage by -50.26%. The diff coverage is 70%.

@@             Coverage Diff             @@
##             v3.0     #199       +/-   ##
===========================================
- Coverage   78.04%   27.79%   -50.26%     
===========================================
  Files          14       14               
  Lines        1567     1576        +9     
  Branches      288      291        +3     
===========================================
- Hits         1223      438      -785     
- Misses        252     1120      +868     
+ Partials       92       18       -74

Impacted Files	Coverage Δ
parsedatetime/context.py	`56% <70%> (-25.82%)`	:x:
parsedatetime/pdt_locales/icu.py	`10.58% <0%> (-72.95%)`	:x:
parsedatetime/init.py	`18.26% <0%> (-56.7%)`	:x:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5b4fd8e...4283265. Read the comment docs.

bear commented 8 years ago

Do you want me to review the changes as you make them or wait until you have the majority of them made?

idpaterson commented 8 years ago

That's up to you, I'm going to work on the cases that require additional support in the pytest setup first then port the remaining normal cases.

idpaterson commented 8 years ago

How does this look for an nlp test?

long_phrases:
    sourceTime: 2013-08-01 21:25:00
    cases:
        - target: !nlpTarget
            - phrase: "At 8PM on August 5th"
              target: 2013-08-05 20:00:00
              context: !pdtContext month | day | hour
            - phrase: "next Friday at 9PM"
              target: 2013-08-09 21:00:00
              context: !pdtContext day | hour
            - phrase: "in 5 minutes"
              target: 2013-08-01 21:30:00
              context: !pdtContext minute
            - phrase: "next week"
              target: 2013-08-08 09:00:00
              context: !pdtContext week
          phrases:
            - >
                I'm so excited!! At 8PM on August 5th i'm going to fly to 
                Florida. Then next Friday at 9PM i'm going to Dog n Bone! 
                And in 5 minutes I'm going to eat some food! Talk to you 
                next week.

@pdtFixture('multiple_dates.yml')
def test_long_phrases(cal, phrase, sourceTime, nlpTarget):
    assert cal.nlp(phrase, sourceTime) == nlpTarget

The target normalization adds the startIndex and endIndex for each phrase automatically based on substrings, so unless startIndex and endIndex are specified in the data file we just have to avoid having a string appear twice in the text.

Sample failures

Wrong context

cal = <parsedatetime.Calendar object at 0x109962790>
phrase = "I'm so excited!! At 8PM on August 5th i'm going to fly to  Florida. Then next Friday at 9PM i'm going to Dog n Bone!  And in 5 minutes I'm going to eat some food! Talk to you  next week."
sourceTime = datetime.datetime(2013, 8, 1, 21, 25)
nlpTarget = ((datetime.datetime(2013, 8, 5, 20, 0), pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY), 17, 37, 'At 8P...in 5 minutes'), (datetime.datetime(2013, 8, 8, 9, 0), pdtContext(accuracy=pdtContext.ACU_WEEK), 176, 185, 'next week'))

    @pdtFixture('multiple_dates.yml')
    def test_long_phrases(cal, phrase, sourceTime, nlpTarget):
>       assert cal.nlp(phrase, sourceTime) == nlpTarget
E       assert ((datetime.da... 'next week')) == ((datetime.dat... 'next week'))
E         At index 0 diff: (datetime.datetime(2013, 8, 5, 20, 0), pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY | pdtContext.ACU_HOUR), 17, 37, 'At 8PM on August 5th') != (datetime.datetime(2013, 8, 5, 20, 0), pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY), 17, 37, 'At 8PM on August 5th')
E         Full diff:
E         ((datetime.datetime(2013, 8, 5, 20, 0),
E         -   pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY | pdtContext.ACU_HOUR),
E         ?                                                                ----------------------
E         +   pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY),
E         17,
E         37,
E         'At 8PM on August 5th'),
E         Detailed information truncated (15 more lines), use "-vv" to show

Missing date in test target

cal = <parsedatetime.Calendar object at 0x1109b3790>
phrase = "I'm so excited!! At 8PM on August 5th i'm going to fly to  Florida. Then next Friday at 9PM i'm going to Dog n Bone!  And in 5 minutes I'm going to eat some food! Talk to you  next week."
sourceTime = datetime.datetime(2013, 8, 1, 21, 25)
nlpTarget = ((datetime.datetime(2013, 8, 5, 20, 0), pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY | pdtContext.ACU...riday at 9PM'), (datetime.datetime(2013, 8, 8, 9, 0), pdtContext(accuracy=pdtContext.ACU_WEEK), 176, 185, 'next week'))

    @pdtFixture('multiple_dates.yml')
    def test_long_phrases(cal, phrase, sourceTime, nlpTarget):
>       assert cal.nlp(phrase, sourceTime) == nlpTarget
E       assert ((datetime.da... 'next week')) == ((datetime.dat... 'next week'))
E         At index 2 diff: (datetime.datetime(2013, 8, 1, 21, 30), pdtContext(accuracy=pdtContext.ACU_MIN), 122, 134, 'in 5 minutes') != (datetime.datetime(2013, 8, 8, 9, 0), pdtContext(accuracy=pdtContext.ACU_WEEK), 176, 185, 'next week')
E         Left contains more items, first extra item: (datetime.datetime(2013, 8, 8, 9, 0), pdtContext(accuracy=pdtContext.ACU_WEEK), 176, 185, 'next week')
E         Full diff:
E         ((datetime.datetime(2013, 8, 5, 20, 0),
E         pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY | pdtContext.ACU_HOUR),
E         17,
E         37,
E         'At 8PM on August 5th'),
E         (datetime.datetime(2013, 8, 9, 21, 0),
E         pdtContext(accuracy=pdtContext.ACU_DAY | pdtContext.ACU_HOUR),
E         73,
E         91,
E         'next Friday at 9PM'),
E         -  (datetime.datetime(2013, 8, 1, 21, 30),
E         -   pdtContext(accuracy=pdtContext.ACU_MIN),
E         -   122,
E         -   134,
E         -   'in 5 minutes'),
E         (datetime.datetime(2013, 8, 8, 9, 0),
E         pdtContext(accuracy=pdtContext.ACU_WEEK),
E         176,
E         185,
E         'next week'))

bear commented 8 years ago

I like it - it reads as easily as the others and doesn't require any repeated items. The "avoid having a string appear twice" constraint seems reasonable for a test suite

idpaterson commented 8 years ago

Yeah and it's only a constraint of convenience. There will need to be a test that includes multiples of the same string because that is an important thing to test, but that test will just need to specify startIndex and endIndex in the YAML.

idpaterson commented 8 years ago

nlp tests are now implemented. Some of the existing nlp tests were specific to nlp (e.g. multiple phrases in a string) and some were generic date and times.

A new tests.data.nlpTarget class wraps data from the test and supports equality testing with nlp responses. It provides a consistent representation for tests with both normal and nlp targets (i.e. target: 2016-01-01 00:00:00 vs target: !nlpTarget ...) and allows some flexibility for the python to modify the test data. For example, the following test updates the sourcePhrase after wrapping the original phrase in quotes:

@pytest.mark.parametrize('prefix,suffix', (('"', '"'), ("'", "'"), ('(', ')')))
@pdtFixture('simple_datetimes.yml', ['times', 'invalid_times', 'dates',
                                     'invalid_dates'])
def test_simple_datetimes_wrapped(cal, phrase, sourceTime, nlpTarget, prefix,
                                  suffix):
    sourcePhrase = u'%s%s%s' % (prefix, phrase, suffix)
    nlpTarget.sourcePhrase = sourcePhrase
    assert cal.nlp(sourcePhrase, sourceTime) == nlpTarget

This allows the nlpTarget to calculate the proper start and end index for the phrase.

The test data can also specify startIndex but the documentation warns that this should only be done when required due to the string appearing more than once:

- target: !nlpTarget
    - phrase: "today"
      context: !pdtContext day
      startIndex: 5
      target: 2013-08-01 09:00:00
    - phrase: "today"
      context: !pdtContext day
      startIndex: 26
      target: 2013-08-01 09:00:00
  phrases: 
    - "Yep, today was as good as today could be"

If everything specified a startIndex it would be more of a pain to write a phrase modifying test like test_simple_datetimes_wrapped above since the startIndex would not be calculated and therefore would not match when the phrase is modified.

I ran into one case that failed (not in the original nlp tests, I pulled in some parse test cases). Since this pull request is strictly not going to fix anything in code I noted the failure with a FIXME:

def test_deltas(cal, phrase, sourceTime, nlpTarget):
    # FIXME: these tests fail
    if phrase in ('1855336.424 minutes ago',):
        return
    assert cal.nlp(phrase, sourceTime) == nlpTarget

All of these test utilities will need unit tests as well so I added them to the list at the top of this PR.

I toyed around with a !replace constructor that would call datetime.replace() on the source time but couldn't find a compelling enough reason to keep it, preferring absolute datetimes for clarity. It made the test data too abstract and I wasn't able to use it together with a delta. For example, "4pm 2 days from now" is a combination of a replacement (current time to 16:00:00) and a delta (plus 2 days).

idpaterson commented 8 years ago

I need to apologize for the lack of updates recently. A small paid project has taken away the time that I was using to work on parsedatetime. The following is not yet complete enough to commit.

Most recently I have reorganized the test classes to more easily add support for deriving times relative to source times and testing anything that does not require an explicit start date against multiple edge case dates. In the last comment I mentioned an attempt to represent a replacement of date components, I ended up with a succinct syntax for that:

day_suffixes:
  cases:
    - target: !replace 2008-08-22 xx:xx:xx
      phrases:
        - "August 22nd, 2008"

This test group has no sourceTime which means that it will be parametrized against each of the edge case dates (Feb 29 leap year, Feb 28 non leap year, end of year pre-1970). I like with this syntax how the replacement makes it clear that the time component will match that of the source time. I was trying to use obvious source times in the original tests like 01:02:03 to show that (because you could easily miss it if all the times are something like 12:00:00), but this is clearer.

There are some slightly weird consequences of testing against multiple dates when you're dealing with partial dates, for example:

dates:
  sourceTime: !replace xxxx-01-xx xx:xx:xx
  cases:
    - target: !replace xxxx-08-25 xx:xx:xx
      context: !pdtContext month | day
      phrases:
        - "8/25"

In this case if the edge case source times were to fall after August 25 it would map to the next year, so the replacement is used to pin the source time to January. The alternate behavior could then be tested with a source time in October where the date will map to the following year:

dates_next_year:
  sourceTime: !replace xxxx-10-xx xx:xx:xx
  cases:
    - target: !datedelta
        sourceTime: !replace xxxx-08-25 xx:xx:xx
        years: 1
      context: !pdtContext month | day
      phrases:
        - "8/25"

Still plugging along albeit quite slowly now.

bear commented 8 years ago

@idpaterson firstly no one working on an OSS project ever needs to apologize for doing paid work first - that's all part of what we do :)

Second, even if it wasn't paid, never worry about taking time to do something - we all work on this code because we love it and it solves an itch, but real life sometimes intrudes.

thanks for the update and for keeping the project moving forward!