genealogysystems / fs-traversal

A JavaScript traversal framework for the FamilySearch Family Tree using the visitor pattern
http://genealogysystems.github.io/fs-traversal/
MIT License
11 stars 2 forks source link

Decouple fetching and visiting #16

Closed justincy closed 9 years ago

justincy commented 10 years ago

Problem

People are fetched and visited in the same order. In other words, we determine who should be visited next then go and fetch them. This method is more simple but has a weakness: callbacks involving a person's relationships are not called until the other people involved in those relationships are also visited.

If your purpose in using fs-traversal is to analyze relationships (count number of men that are younger than their wife, find a list of twins, find a list of parents that had children die before they did) it is much more difficult to define (and understand) the order in which the relationships will be visited.

Also, when using algorithms such as direct ancestors or direct descendants, some relationship callbacks will never fire.


Solution

This problem can be overcome by further decoupling fetching from visiting. We fetch a person as soon as we find a relationship that points to them but don't visit them until it's their turn to be visited. This allows us to call all relevant relationship callbacks for a person when they are visited.

Callbacks that have person as the first parameter will be called when that person is visited. Callbacks that don't have person as the first parameter (child, marriage, parent) will be called when that relationship is first seen.

Are there any consequences or downsides that we haven't seen?

justincy commented 10 years ago

Another bonus from this change is that the filter function could have access to the person objects in addition to the relationships.

storelord commented 10 years ago

Wouldn't this make it so that fetches could not be filtered in the cases where it is known that certain people don't need to be fetched? Maybe there needs to be two different filters.

On Tue, Jun 24, 2014 at 3:31 PM, Justin notifications@github.com wrote:

Another bonus from this change is that the filter function could have access to the person objects in addition to the relationships.

— Reply to this email directly or view it on GitHub https://github.com/genealogysystems/fs-traversal/issues/16#issuecomment-47019490 .

justincy commented 10 years ago

@cumom Sorry I missed your comment; for some reason I didn't get a notification about it.

Could you expound further on this? Perhaps an example. I'm stuck in thinking that fs-traversal is only used in conjunction with fs-check so it's hard for me to think outside of it. Perhaps a case where you know you only want to visit direct ancestors and don't want any of the collateral relationships callbacks?

There's not too much overhead to fetch a person you won't be visiting. With one call we can get a person, all of their relationships, and all of the people in those relationships. So it just adds a little time and data to a call we already have to make.

storelord commented 10 years ago

I often have people with trees of over 50,000 people. That takes a lot longer if spouses are included, which I guess they should be able to choose to do if they want. I also have a situation where the internet connection is very limited and any extra traffic is a huge deal. People are sitting there waiting for the results and there is line of others waiting to use the computer to get their results.

On Wed, Jul 2, 2014 at 1:54 PM, Justin notifications@github.com wrote:

@cumom https://github.com/cumom Sorry I missed your comment; for some reason I didn't get a notification about it.

Could you expound further on this? Perhaps an example. I'm stuck in thinking that fs-traversal is only used in conjunction with fs-check so it's hard for me to think outside of it. Perhaps a case where you know you only want to visit direct ancestors and don't want any of the collateral relationships callbacks?

There's not too much overhead to fetch a person you won't be visiting. With one call we can get a person, all of their relationships, and all of the people in those relationships. So it just adds a little time and data to a call we already have to make.

— Reply to this email directly or view it on GitHub https://github.com/genealogysystems/fs-traversal/issues/16#issuecomment-47819586 .

justincy commented 10 years ago

@cumom Could you explain to me your specific usecase? It would help me to know exactly what your doing. It's possible that there's some other way to satisfy your usecase in fs-traversal.

Is this the cousin lookup program that you discussed recently at the Family History Technology Workshop? I don't remember much about it.

justincy commented 10 years ago

Adding the persons parameter to the Person with Relationships API call will make traversal at least twice as slow, if not closer to 400% or 500% slower. This is because we're fetching many more people and because we'll end up getting them at least twice, if not more. For example, your siblings will be fetched once for your mother, once for your father, and once when we visit them.

I need to rethink this.

justincy commented 10 years ago

I have come to terms with the fact that we can't use the persons parameter on the Person with Relationships call. It's a shame because, from an implementation standpoint, it would've been a very easily solution.


We'll continue using the Person with Relationships call as we are now, without the persons parameter. We will need a queue for fetching and a queue for visiting.

Tracking which relationships we've already visited will change from waiting until all people are visited to waiting until all people are fetched. Remember, relationships are now being assigned the priority of the first person we find in the relationship as opposed to the last person.

It would be ideal if we could find an easy way to make this behavior a traversal option. The last statement of the previous paragraph about the priority of relationships helped me realize you might not always want that.

justincy commented 10 years ago

Current behavior:

  1. Fetch a person and their relationships
  2. Visit the person
  3. Visit all relationships where all people have been visited
  4. Create new fetch objects for people in the relationships that don't have one already and add them to the queue

Proposed new behavior:

Fetch:

  1. Fetch a person and their relationships
  2. Calculate their weight and add them to the visit queue
  3. Visit all relationships for this person where all people have been fetched

Visit:

  1. Visit the person
  2. For all people in their relationships, add them to the fetch queue

The fetch queue will be a normal fifo queue while the visit queue will be a priority queue.

justincy commented 10 years ago

This enhancement might not be necessary at all.

In the case where someone does want this behavior, they can achieve it with the current version of fs-traversal. Instead of listening for all relationship callbacks, just listen for the person callback. Then inside the callback, call persons-with-relationships yourself with the additional persons parameter to get all neighboring people and relationships.

It may seem strange to request a person right after they were just requested, but we would end up fetching persons twice too if we baked this in to the core of fs-traversal. So instead of making everyone incur the extra overhead, it will only be incurred by those who want it, all without increasing the complexity of fs-traversal.

We'll probably have to document exactly how this is done (even now I'm not sure on all the specifics) and maybe even ship a basic utility/helper function that does this (because it probably won't be that trivial). Something like FSTraversal.fooBarCodezNinja. Seriously I have no idea what to call it.

Or maybe a separate js lib that wraps around fs-traversal but provides the same interface so you can still get the relationships callbacks.

I'll have to think about this more.

justincy commented 9 years ago

I'm going to table this. I've convinced myself that it's not necessary.