eriqande / snppit

Program for large scale parent pair inference (in salmon, etc.)
6 stars 2 forks source link

Add known crosses feature #1

Open eriqande opened 9 years ago

eriqande commented 9 years ago

Hi Yniv,

What is currently implemented is the designation of "spawner groups" which is not quite what you have in mind. Since knowing the actual crosses in finer grained than known who might have been spawned with whom.

In an earlier program I had implemented that, but spawner groups are what we needed for salmon.

It would be possible to add it. Here are my thoughts on what it would take:

  1. We would have to have an option to add a file in a simple format that specified the mated pairs. The program would have to read that file and associate the names in that file to the indices of the parents in the data set. (along with some error checking in case they aren't in the genotype data).
  2. Then we we would have to write a short function to tell us whether a proposed pair of parents were listed as mated together.
  3. Then we would have to call that function inside the function that determines pairwise compatibility of pairs of parents. I suspect it would be best to add that call around this line:

https://github.com/eriqande/snppit/blob/5c5a39f407da752d4cafb0dedc635514bc507f3f/src/pbt_geno_compare.c#L345

This seems like a feature I should be able to add.

What would really help me would be if you could provide me an example data file and a file of the matings (just two columns, one with the sire and one with the dam, for example) that I could use for testing and development.

I can't get to it till after the end of the month probably, but please keep pestering me. (I mean that!)

Cheers,

eric

tshori commented 9 years ago

Hey Eric,

I was trying to take a stab at this: so do you think we wrting a function that read the crosses and calling it from AssignMatchingParentPairs would do the trick?

T.

eriqande commented 9 years ago

Hi Tshori,

I think that should do the trick. The hard part might be using uthash to get the indices of the animals referred to in the mating file. Once that is done, it is easy to test that they are a possible mating pair.

I think the function that reads the crosses should be called near the beginning of program execution, and the file should be provided as an argument on the command line. Like:

--mated-pairs MyMatedPairsFile.txt

for example.

I'm been hammered with a report that is due tomorrow. Might have time in the next few months to get on this a little bit.