Given the prism.data.build_cache.ProjectCommitData of two commits, assign the definitions of one commit to the definitions of the other with the goal of ensuring that the assignment preserves the semantic identity of each definition. Definitions that have been removed or added between each commit are not included in the assignment.
Since there are likely to be hundreds of definitions, calculating the assignment between the unconstrained sets is prohibitively expensive. Instead, one may use the diff between the commits to limit the size of the sets for whom a nontrivial assignment must be calculated to just those definitions that intersect the lines in the diff (with the assumption that the size of the diff is much less than the size of the overall project to realize any performance gains). Definitions that do not intersect the diff may be trivially assigned by an identity map over their corresponding indices.
The following ingredients are expected to be critical to enabling a successful implementation:
One may assume that the files in the ProjectCommitData.command_data field are listed in dependency order by construction. This allows one to more easily assign across file boundaries.
Each definition has an attached location. As shown in VernacSentence.sort_sentences, one may use this to sort the definitions across files in ascending order.
Using C implementations provided by numpy, scipy, or other libraries where able to avoid slow Python iterators.
The following subtasks are requested (feel free to slightly adapt them if it simplifies the implementation or improves quality of life for a user/maintainer):
[x] Create the subpackage prism.data.repair with submodule prism.data.repair.align,
[x] In prism.data.repair.align, create a function locations_in_diff : List[SexpInfo.Loc] -> GitDiff -> bool -> List[int] that returns the indices of a given list of locations that intersect a GitDiff := str diff between two commits, where the bool argument indicates whether the locations come from the former or latter commit in the diff,
[x] In prism.data.repair.align, create a function align_commits that takes two ProjectCommitData and a precomputed Git diff between their respective commits and then aligns their definitions according to a provided function List[VernacSentence] -> List[VernacSentence] -> List[Tuple[int, int]].
[x] Supply a default alignment function that satisfies the callable signature expected by align_commits
Given the
prism.data.build_cache.ProjectCommitData
of two commits, assign the definitions of one commit to the definitions of the other with the goal of ensuring that the assignment preserves the semantic identity of each definition. Definitions that have been removed or added between each commit are not included in the assignment.Since there are likely to be hundreds of definitions, calculating the assignment between the unconstrained sets is prohibitively expensive. Instead, one may use the diff between the commits to limit the size of the sets for whom a nontrivial assignment must be calculated to just those definitions that intersect the lines in the diff (with the assumption that the size of the diff is much less than the size of the overall project to realize any performance gains). Definitions that do not intersect the diff may be trivially assigned by an identity map over their corresponding indices.
The following ingredients are expected to be critical to enabling a successful implementation:
ProjectCommitData.command_data
field are listed in dependency order by construction. This allows one to more easily assign across file boundaries.VernacSentence.sort_sentences
, one may use this to sort the definitions across files in ascending order.numpy
,scipy
, or other libraries where able to avoid slow Python iterators.The following subtasks are requested (feel free to slightly adapt them if it simplifies the implementation or improves quality of life for a user/maintainer):
prism.data.repair
with submoduleprism.data.repair.align
,prism.data.repair.align
, create a functionlocations_in_diff : List[SexpInfo.Loc] -> GitDiff -> bool -> List[int]
that returns the indices of a given list of locations that intersect aGitDiff := str
diff between two commits, where thebool
argument indicates whether the locations come from the former or latter commit in the diff,prism.data.repair.align
, create a functionalign_commits
that takes twoProjectCommitData
and a precomputed Git diff between their respective commits and then aligns their definitions according to a provided functionList[VernacSentence] -> List[VernacSentence] -> List[Tuple[int, int]]
.align_commits