Open yiufung opened 3 years ago
Another feature I'd like to add in this branch is to refer to citation using abbreviations. For example, when user type "Lev. 1", it can be translated into "Leviticus 1" properly for further processing.
I think the implementation can an association list that matches abbreviation as key to the full name as value. I plan to follow common abbrevs in Logos and Nashotah.edu as a start. User may add more definitions themselves. As a result, dtk-citation-regexp
will incorporate these abbreviations as well. I think this will be useful for functions such as dtk-follow
.
The current implementation definitely has some weaknesses. I'm entirely supportive of moving toward handling both variants that you identify on full citations (abbreviated book names and verse sets/ranges). I see at least two reasons to do so:
dtk-follow
in any buffer which contains a citation, in some form)With respect to the situation where a citation crosses a line break, we can cross that bridge when we come to it. At this point, I don't believe I've encountered that as a need. Let's put it on the back burner.
I wonder whether it would be best to hold off on merging this branch until we clearly establish:
("I John" 3 (16 18 20))
? ("I John" 3 (16 17 18 19 20))
? ("I John" 3 16 20)
?Thanks for investing time in this! Please let me know your thoughts regarding the above.
If a substantial amount of time is invested, it might be desirable to put together some testing on this. I haven't given much thought to any sort of testing for this project. Maybe go with ert
? https://gist.github.com/thomp/e68d81d4319426bc3015d87dbaf5a442
Since our parsing is handed over to diatheke
to process, I did a quick test.
% diatheke -b KJV -o fmnx -k Jn 3:16,17,18,19,20,21 > sets
% diatheke -b KJV -o fmnx -k Jn 3:16-21 > ranges
% diff sets ranges # no output, meaning no difference to diatheke between sets and ranges
% diatheke -b KJV -o fmnx -k Jn 3:16-21,23-24 > additionals
% diff ranges additionals # means we can mix notations together
6a7,8
> John 3:23: <milestone marker="¶" type="x-p"/><w lemma="strong:G3588" morph="robinson:T-GSM" savlm="strong:G3588 lemma.TR:του" src="9"/><w lemma="strong:G1161" morph="robinson:CONJ" savlm="strong:G1161 lemma.TR:δε" src="2">And</w> <w lemma="strong:G2491" morph="robinson:N-NSM" savlm="strong:G2491 lemma.TR:ιωαννης" src="4">John</w> <w lemma="strong:G2532" morph="robinson:CONJ" savlm="strong:G2532 lemma.TR:και" src="3">also</w> <w lemma="strong:G2258" morph="robinson:V-IXI-3S" savlm="strong:G2258 lemma.TR:ην" src="1">was</w> <w lemma="strong:G907" morph="robinson:V-PAP-NSM" savlm="strong:G907 lemma.TR:βαπτιζων" src="5">baptizing</w> <w lemma="strong:G1722" morph="robinson:PREP" savlm="strong:G1722 lemma.TR:εν" src="6">in</w> <w lemma="strong:G137" morph="robinson:N-PRI" savlm="strong:G137 lemma.TR:αινων" src="7">Ænon</w> <w lemma="strong:G1451" morph="robinson:ADV" savlm="strong:G1451 lemma.TR:εγγυς" src="8">near</w> <w lemma="strong:G4530" morph="robinson:N-PRI" savlm="strong:G4530 lemma.TR:σαλειμ" src="10">to Salim</w>, <w lemma="strong:G3754" morph="robinson:CONJ" savlm="strong:G3754 lemma.TR:οτι" src="11">because</w> <w lemma="strong:G2258" morph="robinson:V-IXI-3S" savlm="strong:G2258 lemma.TR:ην" src="14">there was</w> <w lemma="strong:G4183" morph="robinson:A-NPN" savlm="strong:G4183 lemma.TR:πολλα" src="13">much</w> <w lemma="strong:G5204" morph="robinson:N-NPN" savlm="strong:G5204 lemma.TR:υδατα" src="12">water</w> <w lemma="strong:G1563" morph="robinson:ADV" savlm="strong:G1563 lemma.TR:εκει" src="15">there</w>: <w lemma="strong:G2532" morph="robinson:CONJ" savlm="strong:G2532 lemma.TR:και" src="16">and</w> <w lemma="strong:G3854" morph="robinson:V-IDI-3P" savlm="strong:G3854 lemma.TR:παρεγινοντο" src="17">they came</w>, <w lemma="strong:G2532" morph="robinson:CONJ" savlm="strong:G2532 lemma.TR:και" src="18">and</w> <w lemma="strong:G907" morph="robinson:V-IPI-3P" savlm="strong:G907 lemma.TR:εβαπτιζοντο" src="19">were baptized</w>.
> John 3:24: <w lemma="strong:G1063" morph="robinson:CONJ" savlm="strong:G1063 lemma.TR:γαρ" src="2">For</w> <w lemma="strong:G3588 strong:G2491" morph="robinson:T-NSM robinson:N-NSM" savlm="strong:G3588 strong:G2491 lemma.TR:ο lemma.TR:ιωαννης" src="8 9">John</w> <w lemma="strong:G2258" morph="robinson:V-IXI-3S" savlm="strong:G2258 lemma.TR:ην" src="3">was</w> <w lemma="strong:G3768" morph="robinson:ADV" savlm="strong:G3768 lemma.TR:ουπω" src="1">not yet</w> <w lemma="strong:G906" morph="robinson:V-RPP-NSM" savlm="strong:G906 lemma.TR:βεβλημενος" src="4">cast</w> <w lemma="strong:G1519" morph="robinson:PREP" savlm="strong:G1519 lemma.TR:εις" src="5">into</w> <w lemma="strong:G3588 strong:G5438" morph="robinson:T-ASF robinson:N-ASF" savlm="strong:G3588 strong:G5438 lemma.TR:την lemma.TR:φυλακην" src="6 7">prison</w>.
So the structure would look like (Book, Chapter, Verse)
, where Verse
part:
Luke 3
)Verse
part can be:
Luke 3:1
);Luke 3:1,2,3
, separated by comma ,
); Luke 3:10-12
, separated by dash -
); ,
. I think this is properly handled by diatheke
already. We only need to improve the regular expression to identify the Verse
part properly. I will test and push more commits in the coming days.
Adding a test suite would definitely help. Also, I think adding some elisp formatter/linter would help us maintain the code too. I will keep an eye on these topics and update later.
It sounds like there isn't much that needs to be dealt with with the representation of sets and ranges with Verse
if the only consumer is diatheke. As you note, that just leaves developing a regex to handle the different cases.
Agreed that elisp formatting needs consistency. Is it time to untabify everything? Maybe formatting/linting is better as a separate issue.
Looking forward to upcoming commits. Thanks again.
Hi @thomp , how's it going? It's been a while and I wish all is well with you.
These days I'm thinking to improve upon citation matching. I believe this may involve some code changes to you so I'd like to discuss.
The idea is to come up with functions that help us identify format such as "Matthew 1:10", "Luke 1:10-20", "Genesis 3", etc. To do this I studied a bit on regular expression in Emacs, the main result is a regexp called
dtk-citation-regexp
that help to detect book/chapter/verse/verse-range. You may test with:Based on it, I rewrote
dtk-parse-citation-at-point
. While there are still some edge cases, it works pretty well over my test buffer:The main benefit being that user can put point somewhere within the citation and it should work as expected. There are still restrictions though: the citation cannot cross between 2 lines. But I suppose it's not a major case.
Your input will be appreciated.