ShammyLevva / FTAnalyzer

Family Tree Analyzer - Finds hidden details in your family tree. Install at
http://www.ftanalyzer.com/install
Apache License 2.0
56 stars 23 forks source link

Date Phrase accepted by Family Historian but rejected by FTA #172

Closed SeekerFTA closed 4 years ago

SeekerFTA commented 4 years ago

Version Number problem appears in 7.8.2.0

Describe the bug The following line is rejected by FTA: 2 DATE (1881 Census) Error message was : Problem with date format for: 1881 CENSUS system said: Unrecognised date format for: 1881 CENSUS This is a date phrase created by FH image

Also 2 SOUR @S254@ 3 PAGE 1/12/1928 : London Gazette : 30/11/1928 Error message was : Problem with date format for: - 1/12/1928 system said: Unrecognised date format for: - 1/12/1928 Problem not recognised by Family Historian. File/Validate shows no errors. This is defining where within the source. I'm not sure why FTA is looking for a date in this field. image

To Reproduce Load GEDCOM file

Expected behavior Lines to be accepted

ShammyLevva commented 4 years ago

Looks like two different issues. First issue is you are putting a NON date in a date field. Family Historian is allowing you to enter a date phrase but it should be a VALID date phrase such as 1881 or APR 1881 or for non census dates something like BET 1835 AND 1846.

Most family history programs FH included allow users to be fairly general in how they specify dates. This is fine in most cases as you aren't doing any analysis. The issue here is when you are doing analysis the possible options need to be trimmed down. I've done a massive amount of work on the date processing but never yet come across someone putting the word census in a date field. It's not part of the GEDCOM standard and isn't a common wording I've catered for so throws an error. I can add something that will strip out the word census so as to make it a valid date but the question is why add the word there when that word isn't a valid date word.

Second issue I'm not sure about. I'd agree it shouldn't be looking for a date so it definitely seems to be a bug there but it may be part of the census reference checking. Are you able to export that single individual to a GEDCOM so I can import that and see the error and why it's looking for a date in that field. Thanks.

SeekerFTA commented 4 years ago

Hi

It turns out it’s just one issue – the first one with date phrases.

After I created the single record GEDCOM I saw that there was another place in the same record with a dodgy date phrase. I corrected that and the new file loaded without problems.

FH has several methods for dates to be expressed: as a date, a period, a range, a quarter date or a date phrase. The FTA definition of a valid date phrase would fit into one or more of the other date methods, meaning that the phrase could be restricted to things meaningful to people but not to computers: “1881 Census” has more information and context than “3 April 1881”.

The new file has

DATE TO 1 DEC 1928

The old one had

DATE (?? To 1/12/1928)

I wonder if an option to ignore date phrases is worthwhile? Not for me as I only have 3 records.

Kind Regards

ShammyLevva commented 4 years ago

I'd need to see how FH is storing that in the GEDCOM. The GEDCOM standard has an extremely detailed list of possible formats. Standard formats that have existed for 20+ years there is simply no need for non standard dates.

You are already adding the date to a CENSus date fact which is far far more context that corrupting a date field, so why would you need the word Census in an inappropriate place in the date field. Sorry but that just doesnt make sense. A date field is specifically for storing a date. Context text goes in the description field.

You may not be aware but there are standard date phrases: eg: INFANT, UNKNOWN being two most common.

SeekerFTA commented 4 years ago

Hi

Please find attached a GEDCOM file with all the different ways in which Family Historian stores dates.

The date phrase option can be extended to give an interpreted date. This is accepted by FTA even though it contains a date phrase – as identified by brackets.

The GEDCOM standard appears to me to include date phrases.

Page 43:

DATE_PHRASE:= {Size=1:35}

()

Any statement offered as a date when the year is not recognizable to a date parser, but which gives

information about when an event occurred. The date phrase is enclosed in matching parentheses.

ShammyLevva commented 4 years ago

Sorry you replied by email rather than via the github site so GitHub stripped the attachment thus I've not seen it.

It would be interesting to see the GEDCOM as your earlier post seemed to suggest that FH was writing out a date phrase using a DATE tag and not a DATE_PHRASE tag. I'd agree that a date phrase written as a DATE_PHRASE tag is part of the standard, I'd hope you'd agree that using the wrong tag is the problem.

I do try and cope with a lot of the many and varied ways that different programs/website ignore the standards (Ancestry are the worst, FH amongst the best using raw GEDCOM) however on this occasion the normally excellent FH seems to have erred by using the wrong tag in the output?

SeekerFTA commented 4 years ago

I wasn't aware that GITHUB stripped off attahments. Here is the GED file renamed as a txt file! You can see that FTA does accept a date phrase when it is preceeded by a valid date. DateTest.ged.txt

ShammyLevva commented 4 years ago

Thanks that makes it clearer - yes FH is erroneously writing out a date phrase as a DATE tag and sticking brackets around the text as their interpretation of a phrase instead of using the correct DATE_PHRASE tag.

FTAnalyzer sees that text as a valid date since it's in a DATE tag not a DATE_PHRASE however since the date phrase by definition is text that cannot be interpreted by a date parser ("Any statement offered as a date when the year is not recognizable to a date parser") it fails to parse and throws an error.

By definition a DATE_PHRASE tag would be ignored by a parser as text that cannot be converted to a date. Whereas text in a DATE tag should be in a format that can be parsed. Since the text cannot be parsed because it contains words not meant to be in a DATE tag an error is thrown.

There's a couple of issues here. 1) FH is erroneously writing out a date phrase as a DATE 2) the date phrase if used would have been ignored by the parser in FTAnalyzer anyway as by definition using a date phrase means don't bother trying to parse this.

Possible fixes -

1) add more code to parser to cope with more words that shouldn't be in a date. 2) get FH to fix the bug that is writing out the wrong tag 3) ask the user to put valid date information in the date field and reserve the date phrase for it's correct use ie: for only the most unusual dates where no other dates can be determined 4) all of the above.

ShammyLevva commented 4 years ago

Oops clicked close button by mistake. I'll happily deal with 1 and add more checking code to strip out words like census that have no business being in a DATE tag. I can't deal with 2 & 3 though.

Have you other suggestions for words I can strip out that may be causing issues?

SeekerFTA commented 4 years ago

The GEDCOM standard specifies that the date phrase should always be surrounded by parantheses. Could FTA just ignore any text between parentheses? This is what it appears to do for date phrase with an interpretation - the Blessing in the example file: 1 BLES 2 DATE INT 17 APR 1917 (Easter Sunday 1917) 2 NOTE Date phrase interpreted into a date.

ShammyLevva commented 4 years ago

Yes that makes sense.

I’ll need to have a look at the possible ramifications of that change. In 12 years it’s the first time I’ve seen anyone use that sort of format.

Re-reading the details of the standard I’m now thinking I’ve misunderstood the original point and that FH isn’t making a mistake after all. I didn’t go look at the point the phrase was used as a sub tag only the tag. As a sub tag it’s valid in that context, but not currently catered for by FTAnalyzer.

I’ll add some test cases to see what needs to be done to recognise this. Thanks for raising it if you have more examples I can add as test cases that would give me more to test.

SeekerFTA commented 4 years ago

FH have responded Our reading of the 5.5 standard is both are supported please see

http://homepages.rootsweb.com/~pmcbride/gedcom/55gcch2.htm#DATE_VALUE


and I think you are now agreeing with this. One thing to do when a date phrase occurs without an interpretation would be to give an information message along the lines, "A date phrase has been used. FTA may be able to provide more analysis if an interpretation is provided such as before, after, about, between, etc." As a date phrase can be any text, I think it would be an endless task trying to interpret it.

ShammyLevva commented 4 years ago

I suppose what it comes down to is how should these be interpreted:

DATE (Easter Sunday 1917) - Ignore as just text? - currently throws bad date error DATE (3 Apr) - Ignore as just text? - currently throws bad date error DATE INT 17 APR 1917 (Easter Sunday 1917) - currently uses 17th April 1917 DATE (1881 Census) - Ignore as just text? - currently resolves to 1881 in v7.8.3

SeekerFTA commented 4 years ago

For lines 1, 2 and 4 I would suggest that an information message to the user that FTA might be able to do a better analysis if some sort of interpreted value was used would be the most practical solution. One date phase I have is “died young” I happen to know that this means less than 2 years old because another girl was given the same name, but some might consider that 60 is a bit young to die!

ShammyLevva commented 4 years ago

Sounds like a plan. For died young if I recall correctly - at work so can't refer to standard just now - there's a standard of INFANT for date.

ShammyLevva commented 4 years ago

v8.0 deals with all of these bar DATE (Easter Sunday 1917) where it cannot accurately determine what that means.