Closed andy-z closed 4 years ago
https://www.gedcom.org/samples.html has few samples but they are marked as GEDCOM 5.5.5 and they are relatively small.
http://heiner-eichmann.de/gedcom/gedcom.htm may have one or two samples, they look like they are from 1998.
http://www.geditcom.com/gedcom.html has few "torture" files.
https://gedcomlibrary.com/gedcoms.html has a bunch of large files uploaded by public.
https://chronoplexsoftware.com/myfamilytree/samples/ has couple of interesting samples.
One thing that I realized is that I do not want trees of real people to appear in a public git repo, need to think of a way to structure things so that I could have non-published collection (or private collection) of files that could be tested separately.
Hi Andy ! Using the demo files of the genealogy software to test gedcompy is a good idea.
A list of the most used familly tree software can be make - lot of them can be obselete and impossible to recover (here a list from the french page wikipedia https://fr.wikipedia.org/wiki/Logiciel_de_g%C3%A9n%C3%A9alogie).
This allows in particular to be able to test the parser with the different file encodings available to the software when it exports data to GEDCOM, too, if you wish to carry out tests, if you do not wish to have that gedcom in your repo, several solutions :
After some thinking I decided that I want a separate private repo for the data files and tests that run on those files, main reasons are:
ged4py
repo with large data files that are not critical for its functionThe new repo is called ged4py_testdata
but it is my private repo and I don't plan to give anyone access to it but I will use it as a development tool for testing ged4py
. I'll collect some reasonable set of files there and add tests for basic functionality and maybe some specific features as I go along.
Closing this issue, I will continue adding stuff to my private ged4py_testdata
repo as I go, that will probably trigger more tickets here.
Current bunch of unit tests is limited to data that is encode as strings in the Python code. It would be useful to add a bunch of sample GEDCOM files and a bunch of tests that read and parse those files.