eljeffeg / SmartCopy

Chrome extension for copying genealogical data into Geni.com.
15 stars 14 forks source link

Support for filae.com #28

Open raphink opened 7 years ago

raphink commented 7 years ago

In France, Filae.com has lots of informations and trees. Would you be willing to accept a PR to support it?

eljeffeg commented 7 years ago

Let me investigate how to query data from the site and how difficult it would be to create a parser for it. Do you have a link to a public profile or tree I could use for testing? One challenge will be that I don't speak French and I don't see an option to set the site language to English.

I also have a request in for geneanet.org, so I'll be looking at that as well.

raphink commented 7 years ago

I'm afraid you need a premium account to see profiles. They do give you a 1 month trial though (which I'm currently using). Geneanet would be interesting, too.

raphink commented 7 years ago

I could try to code it already. Do you need a Geni Pro account to use this?

eljeffeg commented 7 years ago

You need a Geni Pro account to be able to copy over family members (API restriction on Geni).

I often use http://www.nosorigines.qc.ca for free French / Canadian trees, so I've considered adding that. What I've tried to do thus far is focus on either the most popular or free sites.

raphink commented 7 years ago

So getting a myheritage data subscription only makes sense with geni pro then ?

eljeffeg commented 7 years ago

I think the MyHeritage Data subscription has a lot of value for research. SmartCopy just makes it easier to copy that data from MyHeritage to Geni. But SmartCopy can work on many websites, to include trees and records at FamilySearch.

raphink commented 7 years ago

Starting implementation in https://github.com/raphink/SmartCopy/tree/filae

eljeffeg commented 7 years ago

I have to say, geneanet.org is horribly constructed. It's so unstructured. No classes, ids, uggg. It's one of the toughest pages I've seen for parsing.

raphink commented 7 years ago

Yes indeed. It's a mess of a DOM...

eljeffeg commented 7 years ago

In some cases, it has been easier to grab the relationships from the initial page and then actually download and parse the parent pages, siblings, children. The data structure for the profile is usually more complete (such as containing the gender) and easier to parse, than trying to grab it out of the relationship info on the focus page.

Here is one page I was using as a test. The death date is giving several profiles a problem. http://gw.geneanet.org/genevtabouis?lang=en&pz=sosa+fictif&nz=legoutiere&ocz=0&p=augustine+madeleine+augustine&n=riviere

eljeffeg commented 7 years ago

We'll also need to redirect the getcode if the url contains type=tree or type=medias

eljeffeg commented 7 years ago

Is filae.com still something you're working on? I hadn't seen any changes to that branch.

raphink commented 7 years ago

No I haven't worked on it really. The main reason is that I hardly use the genealogical trees I find there as there's many more trees on Geneanet. So the main value of Filae is records, but they're not really parseable most of the time. I might want to work on this again later, but for now it's not worth the effort.

Tuisto59 commented 6 years ago

I'm a french python codder and I wrote a little parser to go through geneanet. It make the research, iterate over the result page, go through each tree, and make the table familly tree ancestor (ascendant) through the option of geneanet. I have also a premium account, but its work for normal account. It use requests, re and panda to make it library and python standard library. It's simple. The goal are to make a consensus tree, I also will devellop the research of the newest founded ancestor and make the research again an again until we found anything else. The difficult part are in how to make the logic of the algorythm, by implemented all the logic of a familly tree maker researcher. By comparing the data between each other and like a familly tree maker, choose the good one and make the search again to reach new ancestor. Also with premium account its possible to make a search to find same individual in other tree that carry new and more information like parents, parse in it, and complete and found again ancestor to reach the maximum top end.

I was looking for also parsing the data inside the numerised microfilm of f i l a e . com and get all the indexed reccord. I have also an access to it with my familly tree local association.

F i l a e stole every numerised picture of every departement of france and profite of a juridic void to make index it, billions of picture was reached by this companie and all of them were sended to other companie to process the image and make indexation (Like familly search index image research).

like f i l a e I also make parsing for belgium picture (i can download them in HD) and same for 62, 80, 59, and 02 departemental archives. All in python.

if you will where interested, contact me at: yoan [dot] bouzin {at} gmail [dot] com

raphink commented 6 years ago

@tuisto59 we already have a parser for geneanet. The problem with filae is parsing the dom that is pretty bad.