AlphaGenes / AlphaPeel

AlphaPeel: calling, phasing, and imputing genotype and sequence data in pedigrees
MIT License
2 stars 11 forks source link

Stage_2 #122

Closed AprilYUZhang closed 9 months ago

AprilYUZhang commented 10 months ago

I found a recombination rate table including human, pig, rat, and so on. Aimed at different species, we need different recombination rates, It may be useful Dumont BL, Payseur BA. Evolution of the genomic rate of recombination in mammals. Evolution. 62:276, 2008.

AprilYUZhang commented 10 months ago

In commit # 0632f47, I copied the get_input_options() from tinyhouse, but I just changed the input name when it parsed parameters. The reason why I didn't change the name in the parse_dictionary, pedigree, genotypes, seqfile, bfile are involved many times in the program calls of tinyhouse.

gregorgorjanc commented 10 months ago

I found a recombination rate table including human, pig, rat, and so on. Aimed at different species, we need different recombination rates, It may be useful Dumont BL, Payseur BA. Evolution of the genomic rate of recombination in mammals. Evolution. 62:276, 2008.

Thanks for sharing. There are now even newer recombination rates for some species.

gregorgorjanc commented 10 months ago

In commit # 0632f47, I copied the get_input_options() from tinyhouse, but I just changed the input name when it parsed parameters. The reason why I didn't change the name in the parse_dictionary, pedigree, genotypes, seqfile, bfile are involved many times in the program calls of tinyhouse.

@AprilYUZhang @XingerTang I see you are challenged quite a bit by trying to keep tinyhouse intact, while changing AlphaPeel. I spoke to Hannes today (@bryo-han) and he was of the view that you are effectively doing double work - now on AlphaPeel side and then in the future we will have to fix tinyhouse again. What about actually breaking tinyhouse for other programs? All other programs bring in tinyhouse dependency that is linked to a particular commit via a git submodule. So, if we push development of tinyhouse further, we will not immediately break other programs. Once we do the changes we need in AlphaPeel and tinyhouse we can always go an fix any other AlphaX program so that it uses the new tinyhouse version. This should simplify things a lot!? Thoughts?

XingerTang commented 10 months ago

In commit # 0632f47, I copied the get_input_options() from tinyhouse, but I just changed the input name when it parsed parameters. The reason why I didn't change the name in the parse_dictionary, pedigree, genotypes, seqfile, bfile are involved many times in the program calls of tinyhouse.

@AprilYUZhang @XingerTang I see you are challenged quite a bit by trying to keep tinyhouse intact, while changing AlphaPeel. I spoke to Hannes today (@bryo-han) and he was of the view that you are effectively doing double work - now on AlphaPeel side and then in the future we will have to fix tinyhouse again. What about actually breaking tinyhouse for other programs? All other programs bring in tinyhouse dependency that is linked to a particular commit via a git submodule. So, if we push development of tinyhouse further, we will not immediately break other programs. Once we do the changes we need in AlphaPeel and tinyhouse we can always go an fix any other AlphaX program so that it uses the new tinyhouse version. This should simplify things a lot!? Thoughts?

@gregorgorjanc I understand that we may need to do these changes in tinyhouse later as well, but @AprilYUZhang has already finished all the modifications related to the tinyhouse. And the update of the option names of tinyhouse is more difficult than that of alphapeel as the option names are used everywhere among the 10 files of tinyhouse. It would be more likely for us to miss some of them while modifying the code.

Meanwhile, tinyhouse may still have some bugs unfixed, using different versions of it means we need to introduce changes to different versions of it every time we want to make an update.

I think it would be better if we just stick with this "double work" for a while and then when we are ready to make some big changes to the tinyhouse, we can do everything at once.

gregorgorjanc commented 10 months ago

the update of the option names of tinyhouse is more difficult than that of alphapeel as the option names are used everywhere among the 10 files of tinyhouse. It would be more likely for us to miss some of them while modifying the code.

@XingerTang why would this be hard? It's mostly an input and maybe some variable name change, which is less than doing extra logic, particularly in https://github.com/AlphaGenes/AlphaPeel/pull/122/commits/5821d432e533277590a2b620dfbb4d8feb76708e. The part that I am worried is that while these current changes are not large, they are adding extra logic to AlphaPeel and when we decide to polish tinyhouse we will forget that some of this extra logic should be removed from AlphaPeel.

XingerTang commented 10 months ago

the update of the option names of tinyhouse is more difficult than that of alphapeel as the option names are used everywhere among the 10 files of tinyhouse. It would be more likely for us to miss some of them while modifying the code.

@XingerTang why would this be hard? It's mostly an input and maybe some variable name change, which is less than doing extra logic, particularly in 5821d43. The part that I am worried is that while these current changes are not large, they are adding extra logic to AlphaPeel and when we decide to polish tinyhouse we will forget that some of this extra logic should be removed from AlphaPeel.

@gregorgorjanc It would be hard doesn't because it's hard to write anything, but the code of AlphaPeel and tinyhouse are messy, and it is hard to track where the option names go. For those option names that are adding/modifying functionality, it is not enough to do "search and replace" as we need to find everywhere it is used and understand how it is used. In the above, @AprilYUZhang found where the errors are updated and modified the line of the condition for estimating genotype error and sequence error before that line, from

        if args.esterrors:
            PeelingUpdates.updatePenetrance(pedigree, peelingInfo)

to

        if args.est_geno_error_prob or args.est_seq_error_prob:
            PeelingUpdates.updatePenetrance(pedigree, peelingInfo)

but that is not enough, we need to look up quite closely to see which line corresponds to the update of genotype error and which line corresponds to the update of sequence error inside PeelingUpdates.updatePenetrance(pedigree, peelingInfo) and add additional conditions for them, even though there is no args.esterrors in it. We need to be quite familiar with the code to do all the changes without missing anything. And, when we are facing tinyhouse, there are 10 files. It is easy to lose track of where and how some values are passed from one place to another, but we need to understand the uses of all of them before making changes. That is the time-consuming part. Also, we don't want code inside tinyhouse inconsistent, so we cannot modify only the AlphaPeel related part of tinyhouse.

When we polish tinyhouse, we also have to check every place of AlphaPeel that is related to tinyhouse to avoid inconsistency, so we won't miss any extra logic.

gregorgorjanc commented 10 months ago

@XingerTang @AprilYUZhang I think I understand now. Ok, let's continue and be mindful of the steps beyond the current step. I appreciate your efforts!

AprilYUZhang commented 10 months ago

I find ".called." is not fixed in writeBinaryCalledGenotypes. Although it hasn’t been realized, do we need to change it? @gregorgorjanc @XingerTang

XingerTang commented 10 months ago

I find ".called." is not fixed in writeBinaryCalledGenotypes. Although it hasn’t been realized, do we need to change it? @gregorgorjanc @XingerTang

@AprilYUZhang It's not important for now. I prefer to leave that to the future.

gregorgorjanc commented 10 months ago

I find ".called." is not fixed in writeBinaryCalledGenotypes. Although it hasn’t been realized, do we need to change it? @gregorgorjanc @XingerTang

@AprilYUZhang It's not important for now. I prefer to leave that to the future.

@AprilYUZhang open an issue and we get back to it in future

AprilYUZhang commented 10 months ago

I updated the summary of changes #113

AprilYUZhang commented 10 months ago

the problem of these two options are similar toIn commit #0632f47. I just changed the part parsing argument rather than all. When we consider to change tinyhouse, it will be fixed in the whole software.

In commit # 0632f47, I copied the get_input_options() from tinyhouse, but I just changed the input name when it parsed parameters. The reason why I didn't change the name in the parse_dictionary, pedigree, genotypes, seqfile, bfile are involved many times in the program calls of tinyhouse.

gregorgorjanc commented 10 months ago

the problem of these two options are similar toIn commit #0632f47. I just changed the part parsing argument rather than all. When we consider to change tinyhouse, it will be fixed in the whole software.

In commit # 0632f47, I copied the get_input_options() from tinyhouse, but I just changed the input name when it parsed parameters. The reason why I didn't change the name in the parse_dictionary, pedigree, genotypes, seqfile, bfile are involved many times in the program calls of tinyhouse.

@AprilYUZhang I don't know to which question/comment you are referring to with this answer. Can you clarity? Is this related to my comment in this thread https://github.com/AlphaGenes/AlphaPeel/pull/122#discussion_r1399369654?

AprilYUZhang commented 10 months ago

Sorry, I expressed it ambiguously. It is indeed easy to be confused with rec_prob because I don't mention which option I'm talking about. I mean out_id_order, out_id_only is like pedigree, phasefile. These options involved the tinyhouse.

gregorgorjanc commented 10 months ago

@AprilYUZhang I think this looks very good all in all! Are there any more planned changes - it seems you have done all of the points in https://github.com/AlphaGenes/AlphaPeel/issues/113!?

@XingerTang I would appreciate quick code review from your end before we merge this in.

XingerTang commented 10 months ago

@AprilYUZhang @gregorgorjanc There are still some changes that need to be made before we merge. Please see the above comments.

XingerTang commented 10 months ago

@AprilYUZhang @gregorgorjanc I just noticed that we still have an s in n_cycles which is the updated name of ncycles. Should we keep it or change it to n_cycle?

(also the n_threads and n_io_threads)

gregorgorjanc commented 10 months ago

@AprilYUZhang @gregorgorjanc I just noticed that we still have an s in n_cycles which is the updated name of ncycles. Should we keep it or change it to n_cycle?

@XingerTang well spotted! @AprilYUZhang please check all the options and let us know what you suggest as a unified approach regarding the plural and the associated “s”.

AprilYUZhang commented 10 months ago

Solved the above problems.

XingerTang commented 10 months ago

@gregorgorjanc @AprilYUZhang Sorry I forgot the commit here is not up to date and sent the wrong comment, I already deleted that. @AprilYUZhang Could you pull the recent changes that just merged into the branch, and let's see how the test go

XingerTang commented 10 months ago

@AprilYUZhang Could you pull the recent merged commits again? Let's see how the tests run :)

XingerTang commented 9 months ago

@gregorgorjanc Could you please merge this pull request?

gregorgorjanc commented 9 months ago

@gregorgorjanc Could you please merge this pull request?

@XingerTang i lost track of which of the above discussions and requested changes have been addressed l please go over them and resolve each one (maybe the changes have all been achieved?) and then we merge this in,

XingerTang commented 9 months ago

@gregorgorjanc Yes, all the changes are achieved and every problem is addressed. I think we can merge this in.