ShammyLevva / FTAnalyzer

Family Tree Analyzer - Finds hidden details in your family tree. Install at
http://www.ftanalyzer.com/install
Apache License 2.0
54 stars 22 forks source link

Time intervals are not evaluated correctly if they don't have exact endpoints #272

Closed RonaldHalmen closed 1 year ago

RonaldHalmen commented 1 year ago

Is your feature request related to a problem? Please describe. False data errors like "Birth after mother's death" are reported if the corresponding parent has a death date that was entered as an interval without exact endpoints. E.g. an individual was born in 1839, while the mother died after 1858 but prior to 1871. If that interval was entered as "(1858,1871]" (as in the screenshot attached) or "(1858,1871)" instead of "[1858,1871]", then the FTA reports it as an error with the following description: "Mother died BET AFT 1858 AND BEF 1871 which is before individual was born".

Describe the solution you'd like The resolution of a time interval must transform "BET AFT x AND BEF y" into "BET x AND y" before coming to the conclusion whether it's an error or not. If x is an incomplete date, e.g. only a YEAR or a MONTH+YEAR combination, it must be translated into the oldest possible date for the lower endpoint and into the newest possible date for the upper endpoint, e.g.:

BET AFT 1858 AND BEF 1871 --> BET 1 JAN 1858 AND 31 DEC 1871 BET AFT MAR 1858 AND BEF FEB 1871 --> BET 1 MAR 1858 AND 28 FEB 1871

Describe alternatives you've considered Ignore these errors if they have been inappropriately reported.

Additional context FTB BETWEEN NON-EXACTLY

ShammyLevva commented 1 year ago

As far as I can tell there is a problem with your GEDCOM formatting I don't believe those are valid GEDCOM dates. ie: BET AFT 1858 AND BEF 1871 is not a valid GEDCOM date, correct would be BET 1858 AND 1871. There are superflous AFT and BEFs. So I suspect that the issue lies in it not giving an error for the invalid GEDCOM date and not the date calculation. I've never seen two modifiers used consecutively before and my reading of the v5.5.1 spec is that this isn't allowed. (two modifiers being BET AFT, and AND BEF).

When parsed those dates should transform internally to startdate 01/01/1858 and enddate 31/12/1871. As all dates are held internally as a start and end date for date calculations. I suspect what is happening is that a date that had this invalid GEDCOM formatting is being wrongly parsed.

Can you provide a sample GEDCOM snippet with an individual with some of these formats of dates. I can then add them to the test suite to see exactly how they are handled and thus fix the problem and retain the test cases for checking it always still works regardless of any future changes.

Ideally I want to see what GEDCOM is being generated by the program you are using and how that is being presented.

ShammyLevva commented 1 year ago

I've added what I think may be test examples and it was accepting the date format as valid but when it parsed the ABT 1871 which is was expecting just a date part and not a 2nd modifier it managed to come up with MINDATE ie: 01/01/0001 for the start date which explains why it was giving the mother died before child was born error, as her start death date was effectively MINDATE ie: definitely before any other date.

I've added some code to strip out double modifiers like these but there may be other invalid combos that I need to strip out so some examples from what the program produces would help. I believe these are invalid GEDCOM formats so the so the code that produces these dates I believe to be erroneous. I can get FTA to cope as it does with many date formats by understanding the issue and correcting for it.

The initial version of this fix is coded and I can make it available in a beta if you want to test it.

RonaldHalmen commented 1 year ago

In the meantime I did some more research by looking again at both the GEDCOM 5.5.1 spec (https://gedcom.io/specifications/ged551.pdf) and at the way how it is handled by some Genealogy software vendors. My conclusion is the same like yours: an event can have a DATE_VALUE, but this DATE_VALUE can be either a DATE_PERIOD (FROM/TO) or a DATE_RANGE (BEF/AFT/BET), but not a combination of them. In fact, Family Tree Builder is the only one that supports the usage of this enhanced DATE_PERIOD definition:

1 OCCU Engineer 2 DATE FROM AFT 1945 TO 1965 2 PLAC Paris

Semantically, this representation makes sense to me if it's known that e.g. somebody served as Engineer after WWII until his retirement in 1965, if we don't know when he exactly started his job. It may have been 1945, 1946 or even later. But, unfortunately, the unary operators BEF and AFT have been defined in a different way 25 years ago, so that you are not supposed to know about this private extension done by one vendor.

RonaldHalmen commented 1 year ago

The initial version of this fix is coded and I can make it available in a beta if you want to test it.

Sure, I can have a look at the beta version if you share a download link with me.

ShammyLevva commented 1 year ago

2 DATE FROM AFT 1945 TO 1965

I can handle these easily enough it's just a case of getting all the possible combos and dealing with them. So far we appear to have...

any others?

RonaldHalmen commented 1 year ago

This is the full list of cases from my perspective:

Any of these x or y values could be either DATE_VALUE or AFT DATE_VALUE or BEF DATE_VALUE.

RonaldHalmen commented 1 year ago

In this context, I have also observed that for Error Types like "Facts dated before birth" the Description field is empty in case of event EDUC. Here comes a GEDCOM snippet you could use for checking:

1 EDUC Study of Theology 2 DATE TO BEF 1934 2 PLAC Vienna 2 NOTE Dr. 1 OCCU Priest 2 DATE BET 1934 AND 1938 2 PLAC Budapest

However, it is populated with meaningful error messages in case of other events like Burial, Christening or Residence, e.g.: Residence fact recorded: BET 20 NOV 1944 AND AFT 1948 before individual was born

ShammyLevva commented 1 year ago

It would be useful to compare the GEDCOM of a successful one vs the GEDCOM above. I’m not sure why that would be though.

RonaldHalmen commented 1 year ago

If there is no issue with the date format (AFT/BEF) there is nothing to be reported. However, if we have a case like this and if the date is used in the context of EDUC, something different happens. Just copy the 7 lines from above in a text editor into a GEDCOM test file that contains only one person. In case you want me to generate such a file for you, please let me know.

ShammyLevva commented 1 year ago

I can easily check what happens with the above GEDCOM I just felt it would be instructive to compare with how it works with some other format of data and what is different.

RonaldHalmen commented 1 year ago

Based on my observations I have generated the attached GEDCOM file. Please check the Data Errors overview to see what I mean with empty Description. Not sure what exactly you want to see in terms of different format. By the way, the file can be used to illustrate a couple of what I would consider minor bugs. Should I describe them here or in a separate issue? GEDCOM_Test.ged.txt

ShammyLevva commented 1 year ago

This appears to be all ok now in v10.0.0.0-beta2