Closed HowardCohl closed 4 years ago
Yes. Let's discuss this here. I have been working on that the program and it worked very well so far.
- splitting multiple formulas in
{equationgroup}
commands such as http://dlmf.nist.gov/10.6.E1;
Abdou had nice templates for that. I think this is done. Here are the first lines of the 10.6 file
\BesselC{\nu-1}@{z}+\BesselC{\nu+1}@{z}=(2\nu/z)\BesselC{\nu}@{z}, \url{https://dlmf.nist.gov/10.6#Ex1}
\BesselC{\nu-1}@{z}-\BesselC{\nu+1}@{z}=2\BesselC{\nu}'@{z}. \url{https://dlmf.nist.gov/10.6#Ex2}
\BesselC{\nu}'@{z}=\BesselC{\nu-1}@{z}-(\nu/z)\BesselC{\nu}@{z}, \url{https://dlmf.nist.gov/10.6#Ex3}
\BesselC{\nu}'@{z}=-\BesselC{\nu+1}@{z}+(\nu/z)\BesselC{\nu}@{z}. \url{https://dlmf.nist.gov/10.6#Ex4}
\displaystyle\BesselJ{0}'@{z}=-\BesselJ{1}@{z}, \url{https://dlmf.nist.gov/10.6#E3X} \comments{Warning: Part 1 of multicontent tex element;}
\displaystyle\BesselY{0}'@{z}=-\BesselY{1}@{z}, \url{https://dlmf.nist.gov/10.6#E3X} \comments{Warning: Part 2 of multicontent tex element;}
\displaystyle\HankelH{1}{0}'@{z}=-\HankelH{1}{1}@{z}, \url{https://dlmf.nist.gov/10.6#E3Xa} \comments{Warning: Part 1 of multicontent tex element;}
\displaystyle\HankelH{2}{0}'@{z}=-\HankelH{2}{1}@{z}. \url{https://dlmf.nist.gov/10.6#E3Xa} \comments{Warning: Part 2 of multicontent tex element;}
Overall there are 30 multicontent situations in the dataset. That means 30 formulae for which one can not generate an unambiguous deeplink to the DLMF.
- splitting multiple formulas in
{equationmix}
commands such as http://dlmf.nist.gov/10.9.E18.
is handled as well
\HankelH{1}{\nu}@{z}=\frac{1}{\pi i}\int_{-\infty}^{\infty+\pi i}e^{z\sinh@@{t}-\nu t}\diff{t}, \url{https://dlmf.nist.gov/10.9#Ex7}
\HankelH{2}{\nu}@{z}=-\frac{1}{\pi i}\int_{-\infty}^{\infty-\pi i}e^{z\sinh@@{t}-\nu t}\diff{t}. \url{https://dlmf.nist.gov/10.9#Ex8}
I think these aspects should be handled within Andres program:
- removing or ignoring commas, etc. at end of equations;
It should be done for all input and is no specific thing to the DLMF
- globally replacing
e->\expe
,i->\iunit
,\pi->\cpi
(perhaps this information is already available in the XML, also there might be other replacements which are correctly handled);
XML does it when the DLMF editors decided to do so. We did not apply the DRMF heuristics. This would again be something I would see more as feature of Andres' program.
- separating out separate
\pm
and \pm` formulas into two separate formulas such as http://dlmf.nist.gov/10.15.E1;
This should certainly be done within Andres' program. That way it is a feature of the program instead of a prerequisite. The implementation effort is the same either way.
To update everybody
I think these aspects should be handled within Andres program:
- removing or ignoring commas, etc. at end of equations;
It should be done for all input and is no specific thing to the DLMF
Yes, it is already implemented in the program and works well.
- globally replacing e->\expe, i->\iunit, \pi->\cpi (perhaps this information is already available in the XML, also there might be other replacements which are correctly handled); XML does it when the DLMF editors decided to do so. We did not apply the DRMF heuristics. This would again be something I would see more as feature of Andres' program.
I disagree, replacing e
by \expe
is very context-dependent and is only true in the DLMF dataset. This is clearly a flaw in the DLMF data, especially when you see that some i
are already given as \iunit
but not all. I will not update the translator to handle that, instead, I replace these three cases manually in the test dataset.
- separating out separate \pm and \pm` formulas into two separate formulas such as http://dlmf.nist.gov/10.15.E1;
This should certainly be done within Andres' program. That way it is a feature of the program instead of a prerequisite. The implementation effort is the same either way.
I agree and it is already included in the engine. However, the translator itself cannot handle \pm
and I think it shouldn't. There is currently no case-by-case translation and there should be only one translation for one input.
However, regarding the test set, I implemented to split the test cases into sub-cases.
Besides that, I don't see a reason why the DLMF links should be unambiguous. I think it's fine if there are multiple tests referring to the same DLMF link.
@HowardCohl @physikerwelt
Other problems are constraints and substitutions. I fixed some common substitutions manually (mainly for \zeta
which is often used for substitution in the DLMF).
Also, how about the constraints? As I mentioned in an e-mail to Howard, in some cases there are constraints just given in the infobox, e.g., k
is an integer here but it is not explicitly given as a constraint.
Is this information included in the new data or not? Just one example: https://dlmf.nist.gov/4.21#E34
Here, n
is an integer but there is no constraint in the dataset for this case. The data only contains:
\cos@{nz}+\iunit\sin@{nz}=(\cos@@{z}+\iunit\sin@@{z})^{n}. \url{http://dlmf.nist.gov/4.21.E34}
I disagree, replacing
e
by\expe
is very context-dependent and is only true in the DLMF dataset. This is clearly a flaw in the DLMF data, especially when you see that somei
are already given as\iunit
but not all. I will not update the translator to handle that, instead, I replace these three cases manually in the test dataset.
This is cheating. We can not publish a paper where we manually tune the dataset as we want. We could also skip the evaluation and just invent some numbers.
I don't think so. e
is always \expe
in DLMF, as well as i
is \iunit
and \pi
and \cpi
. However, in other scenarios, outside of the DLMF, this is not true. The only reason why there is e
and i
and \pi
in the DLMF is because the authors didn't use the semantic macros.
So what we do is not cheating, it is adding the missing information.
Is this information included in the new data or not? Just one example: https://dlmf.nist.gov/4.21#E34
No. Not yet. However, we can add the symbols list.
Feel free to suggest changes to the PR https://github.com/abdouyoussef/MLP/pull/5
So what we do is not cheating, it is adding the missing information.
which is cheating.
We need to include this assumption to the program as option and be open about it.
@HowardCohl I discussed the remaining issue with André on the phone. Could you please clarify why one doesn't replace e->\expe, i->\iunit, \pi->\cpi
in the DLMF source.
Bruce does do the replacement. There is a command at the top of the source which tell them to do this for certain chapters. I can tell you which chapters if you want. In fact, I definitely have to do this. I will do this later.
These are the chapters where e->\expe
, etc. replacements are specified in the DLMF source:
AI.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi
AI.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i
AI.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!!
AI.tex:\lxDeclare[replace=$\EulerConstant$]{$\gamma$}% Euler's constant
AL.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi
AL.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i
AL.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!!
AS.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi
AS.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i
AS.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!!
BP.tex:\lxDeclare[replace=$\cpi$]{$\pi$}%
BP.tex:\lxDeclare[replace=$\expe$]{$e$}%
BP.tex:\lxDeclare[replace=$\iunit$]{$i$}%
BS.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi
BS.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i
BS.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!!
CH.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi
CH.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!!
CH.tex:\lxDeclare[replace=$\EulerConstant$]{$\gamma$}%
CW.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi
CW.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!!
EF.tex:\lxDeclare[replace=$\expe$]{$e$}%
EF.tex:\lxDeclare[replace=$\iunit$]{$i$}%
EF.tex:\lxDeclare[replace=$\cpi$]{$\pi$}%
EL.tex:\lxDeclare[replace=$\iunit$]{$i$}%
EL.tex:\lxDeclare[replace=$\cpi$]{$\pi$}%
ER.tex:\lxDeclare[replace=$\expe$]{$e$}%
ER.tex:\lxDeclare[replace=$\iunit$]{$i$}%
ER.tex:\lxDeclare[replace=$\cpi$]{$\pi$}%
EX.tex:\lxDeclare[replace=$\expe$]{$e$}%
EX.tex:\lxDeclare[replace=$\iunit$]{$i$}%
EX.tex:\lxDeclare[replace=$\cpi$]{$\pi$}%
FM.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi
GA.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi
GA.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i
GA.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!!
GH.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi
HE.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi
HE.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i
HY.tex:\lxDeclare[replace=$\cpi$]{$\pi$}%
HY.tex:\lxDeclare[replace=$\expe$]{$e$}%
IC.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i
IC.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi
IG.tex:\lxDeclare[replace=$\cpi$]{$\pi$}%
IG.tex:\lxDeclare[replace=$\iunit$]{$i$}%
IG.tex:\lxDeclare[replace=$\expe$]{$e$}%
JA.tex:\lxDeclare[replace=$\expe$]{$e$}%
JA.tex:\lxDeclare[replace=$\iunit$]{$i$}%
JA.tex:\lxDeclare[replace=$\cpi$]{$\pi$}%
JA.tex:\lxDeclare[replace=$\compellintKk@@{k}$]{$K$}%
JA.tex:\lxDeclare[replace=$\ccompellintKk@@{k}$]{$K'$}%
LA.tex:\lxDeclare[replace=$\cpi$]{$\pi$}%
LE.tex:\lxDeclare[replace=$\expe$]{$e$}%
LE.tex:\lxDeclare[replace=$\iunit$]{$i$}%
LE.tex:\lxDeclare[replace=$\cpi$]{$\pi$}%
MA.tex:\lxDeclare[replace=$\cpi$]{$\pi$}%
MA.tex:\lxDeclare[replace=$\expe$]{$e$}%
MT.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi
MT.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i
MT.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!!
NM.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i
NM.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!!
NT.tex:\lxDeclare[replace=$\expe$]{$e$}%
OP.tex:\lxDeclare[replace=$\expe$]{$e$}%
OP.tex:\lxDeclare[replace=$\iunit$]{$i$}%
OP.tex:\lxDeclare[replace=$\cpi$]{$\pi$}%
PC.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi
PC.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i
PC.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e
PT.tex:\lxDeclare[replace=$\expe$]{$e$}%
PT.tex:\lxDeclare[replace=$\iunit$]{$i$}%
PT.tex:\lxDeclare[replace=$\cpi$]{$\pi$}%
QH.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi
QH.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i
ST.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi
ST.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i
ST.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!!
SW.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi
SW.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i
SW.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!!
TH.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi
TH.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i
TH.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!!
TJ.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi
WE.tex:\lxDeclare[replace=$\expe$]{$e$}%
WE.tex:\lxDeclare[replace=$\iunit$]{$i$}%
WE.tex:\lxDeclare[replace=$\cpi$]{$\pi$}%
ZE.tex:\lxDeclare[replace=$\cpi$]{$\pi$}%
ZE.tex:\lxDeclare[replace=$\iunit$]{$i$}%
ZE.tex:\lxDeclare[replace=$\expe$]{$e$}%
Note that there are two files where \gamma
is replace by Euler's constant (see above).
AI.tex:\lxDeclare[replace=$\EulerConstant$]{$\gamma$}% Euler's constant
CH.tex:\lxDeclare[replace=$\EulerConstant$]{$\gamma$}%
@physikerwelt Why am I assigned to this task? I have no ability at the moment to change Andre's program.
@HowardCohl thank you. The assignment indicates that we rely on you to make progress with this task. This indicates that the problem is not in the DLMF source but somewhere downstream. Either in LaTeXML or my addition to abdous program.
@physikerwelt @AndreG-P @abdouyoussef
@HowardCohl thank you. The assignment indicates that we rely on you to make progress with this task. This indicates that the problem is not in the DLMF source but somewhere downstream. Either in LaTeXML or my addition to abdous program.
There is no problem. In fact, since in the metadata (and because of those \lxDeclare
's) it should be clear that e
is \expe
, etc. The replacements should either be clear, or have already been accomplished.
@HowardCohl I think there is no problem in the DLMF. But there is a problem in picking up this information during the generation of the dataset file which is used by @AndreG-P's program. Thus the problem should be fixed in my addition to Abdou's program, or in @abdouyoussef's program (cf. https://github.com/abdouyoussef/MLP/issues/6)
I figured out that the program of @abdouyoussef works differently. It extracts the symbols used using the same mechanism a human would use who clicks on the ibox on the dlmf website. This seems to be legitimate. @HowardCohl please check https://dlmf.nist.gov/4.2#E7 The formula has an i but it is not referenced in the ibox. Could you explain to me, why this information is missing?
@HowardCohl please check https://dlmf.nist.gov/4.2#E7 The formula has an i but it is not referenced in the ibox. Could you explain to me, why this information is missing?
I honestly don't know. I just sent an email to @brucemiller about this. I will let you know when he responds.
Thank you. But do you think the i *should be linked?
Thank you. But do you think the i *should be linked?
Seems like it should.
@physikerwelt is right, my code (at this point) extracts the same symbols defined and the symbols used as the ones included in the info box, no more and no less. At a later stage, when I complete the code (with machine learning and NLP stuff), I will be able to detect the missing definitions and "uses", and make them available in the dataset.
Note, BTW, I observed that there are a lot of equations where their info boxes do not include/link everything in the equations, either by design (?) or by omission. Hopefully I will succeed in writing ML/NLP code that will rectify that.
@abdouyoussef
Note, BTW, I observed that there are a lot of equations where their info boxes do not include/link everything in the equations, either by design (?) or by omission.
Can you be precise?
Well, looking at Section 10.7, you see that i is not mentioned in the info box in any of the equations there, e.g., 10.7.2, 10.7.6, 10.7.7.
In Section 10.13: equations 10.13.1-8 info boxes have no mention of lambda, by design I bet, because lambda is defined in-line in the first line of that section.
Speaking more broadly, many constrained equations do not have all their constraints formally put inside the constraints part, but instead are mixed in with the text before or after the equation. Thus, as things stand, a dataset generated from the DLMF without the extra analysis that hunts for "missing" definitions and "missing" constraints is not a "complete" dataset. By "missing" I mean the entity is not specified formally in the info box or in the constraints portion (if the entity is a constraint).
A translator like what @ AndreG-P is developing needs a complete dataset, in the sense that all the entities and constraints of an equation have to be fully specified within the formal bounds of the equation, rather than be partially distributed across text, even if that text is nearby.
Well, looking at Section 10.7, you see that i is not mentioned in the info box in any of the equations there, e.g., 10.7.2, 10.7.6, 10.7.7.
Yes, we already discussed i
or \iunit
. That is not in dispute. It seems to be missing.
In Section 10.13: equations 10.13.1-8 info boxes have no mention of lambda, by design I bet, because lambda is defined in-line in the first line of that section.
True. However, it does say in the text that \lambda
is real or complex constant such that \lambda\ne 0
.
It does need to get in the metadata as well.
Speaking more broadly, many constrained equations do not have all their constraints formally put inside the constraints part, but instead are mixed in with the text before or after the equation.
This is true. In fact, I started an issue about this in 2015. https://github.com/usnistgov/dlmf/issues/4
Thus, as things stand, a dataset generated from the DLMF without the extra analysis that hunts for "missing" definitions and "missing" constraints is not a "complete" dataset. By "missing" I mean the entity is not specified formally in the info box or in the constraints portion (if the entity is a constraint).
However, it is in the text (with perhaps some exceptions which represent errata.).
A translator like what @ AndreG-P is developing needs a complete dataset, in the sense that all the entities and constraints of an equation have to be fully specified within the formal bounds of the equation, rather than be partially distributed across text, even if that text is nearby.
Good luck! :)
@abdouyoussef
Are you able to output the missing symbol data somehow? e.g., what symbols are currently missing from the metadata?
This would be extremely useful and would be very useful to move this project forward.
I vaguely remember that we could tune the symbols using the DLMF software. Unfortunately I can not look into the details due to https://github.com/usnistgov/dlmf/issues/99
@HowardCohl At some point I will be able to output the missing symbol data. It is on my agenda, but I can't promise it will happen any time soon because I a have a bunch of other commitments elsewhere and some holiday travels coming up. But I'll keep it on my list of priorities.
As far as I can see, the problem is that all the information is there but often in different places (infobox, constraints and the text surrounding the formula). Especially the latter (surrounding text) is hard to capture.
@abdouyoussef I appreciate any progress, but is it realistic to plan with these improvements for the JCDL paper?
@physikerwelt @HowardCohl
So the quintessence here is that the DLMF fully describes if e
is \expe
(as well as other replacements). Since this information is missing in my data, we should not implement any assumptions like e
is \expe
in the translator and instead, update the test data. Does everybody agree? (@physikerwelt I didn't forget our custom replacement feature, see #110)
If so, @physikerwelt can you update the test data and send me an updated version where e
, i
, \pi
and \gamma
are replaced in all cases that are defined by lxDeclare[replace...
?
Besides that, just for clarification, the test data contains all information that is in the infobox or in the constraint (surrounding text is not captured), is that correct? As we discussed earlier, all this information should be given in \constraint{ . }
in the test data.
@abdouyoussef @physikerwelt
Also, to bring up the substitution problem again (see https://dlmf.nist.gov/9.6#E2): \zeta
is linked in the infobox to https://dlmf.nist.gov/9.6#E1. So it should be somehow possible to capture this substitution as well in the dataset, right?
@HowardCohl
I remember you somehow replaced k'
, for example in https://dlmf.nist.gov/22.2#p2. Was this information also given in \lxDeclare
and can we perform this replacement now also? I also noticed that the prime comes after the arguments now in this section. I'm not sure, but I think in our old data two years ago the position of the primes where different. For example, it seems K(k)'
is not defined now...
@AndreG-P I am not sure if I can differentiate all the different issues you discuss in this ticket, but I will try my best, to address all aspects. Let me know if I missed something.
1) we should update the test dataset, but we can't do that at the moment. Since I have problems running the DLMF software. We need to figure out where the information is not propagated to the iboxes..
2) regarding 9.6.2. We have the following information https://github.com/abdouyoussef/MLP/blob/147ab2d54e98567690ec52e86374d0e29acfeaab/MathNLP/ReferenceData/Datasets/dlmf/dlmf-chapters-OneTextBlockPerEquation/9/9.6.txt#L25-L54 which can be exposed to the other data format. Would any of these lines help with your problem?
@physikerwelt
@AndreG-P I am not sure if I can differentiate all the different issues you discuss in this ticket, but I will try my best, to address all aspects. Let me know if I missed something.
- we should update the test dataset, but we can't do that at the moment. Since I have problems running the DLMF software. We need to figure out where the information is not propagated to the iboxes..
Yes. The new version should consider replacements, such as e
to \expe
, and constraints from iboxes and the actual constraint tags. (See also: https://github.com/abdouyoussef/MLP/pull/5#issuecomment-558538905)
The problem is to distinguish between domain definitions and other things in the infoboxes. For example, k: integer
is important but k: modulus
is not helpful and could even causing failure for the test case. Can we somehow at least include the domain specifications as constraints, such as z: complex
? Also, this must be given in a mathematical equation and not in text form (not z: complex
but z \in \Complex
).
- regarding 9.6.2. We have the following information https://github.com/abdouyoussef/MLP/blob/147ab2d54e98567690ec52e86374d0e29acfeaab/MathNLP/ReferenceData/Datasets/dlmf/dlmf-chapters-OneTextBlockPerEquation/9/9.6.txt#L25-L54 which can be exposed to the other data format. Would any of these lines help with your problem?
@abdouyoussef @physikerwelt It looks like there is no link to 9.6.1 anymore in this data. This makes it very difficult to substitute correctly. The problem are the other scenarios where "change of variable" appears.
Consider: https://dlmf.nist.gov/9.8#SS1.p3
Here, \xi
is marked as a change of variable but the actual change is not given in our test dataset.
Consider also: https://dlmf.nist.gov/22.11#E1
Here \zeta
is again a change of variable but it is linked even to another subsection.
Maybe, I just consider "change of variable (locally)". It's only a few cases but better than nothing.
@AndreG-P as discussed here the provisionary files
``` 9.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi 9.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i 9.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!! 9.tex:\lxDeclare[replace=$\EulerConstant$]{$\gamma$}% Euler's constant 1.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi 1.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i 1.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!! 2.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi 2.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i 2.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!! 24.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% 24.tex:\lxDeclare[replace=$\expe$]{$e$}% 24.tex:\lxDeclare[replace=$\iunit$]{$i$}% 10.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi 10.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i 10.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!! 13.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi 13.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!! 13.tex:\lxDeclare[replace=$\EulerConstant$]{$\gamma$}% 33.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi 33.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!! 4.tex:\lxDeclare[replace=$\expe$]{$e$}% 4.tex:\lxDeclare[replace=$\iunit$]{$i$}% 4.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% 19.tex:\lxDeclare[replace=$\iunit$]{$i$}% 19.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% 7.tex:\lxDeclare[replace=$\expe$]{$e$}% 7.tex:\lxDeclare[replace=$\iunit$]{$i$}% 7.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% 6.tex:\lxDeclare[replace=$\expe$]{$e$}% 6.tex:\lxDeclare[replace=$\iunit$]{$i$}% 6.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% 35.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi 5.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi 5.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i 5.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!! 16.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi 31.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi 31.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i 15.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% 15.tex:\lxDeclare[replace=$\expe$]{$e$}% 36.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i 36.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi 8.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% 8.tex:\lxDeclare[replace=$\iunit$]{$i$}% 8.tex:\lxDeclare[replace=$\expe$]{$e$}% 22.tex:\lxDeclare[replace=$\expe$]{$e$}% 22.tex:\lxDeclare[replace=$\iunit$]{$i$}% 22.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% 22.tex:\lxDeclare[replace=$\compellintKk@@{k}$]{$K$}% 22.tex:\lxDeclare[replace=$\ccompellintKk@@{k}$]{$K'$}% 29.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% 14.tex:\lxDeclare[replace=$\expe$]{$e$}% 14.tex:\lxDeclare[replace=$\iunit$]{$i$}% 14.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% 28.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% 28.tex:\lxDeclare[replace=$\expe$]{$e$}% 21.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi 21.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i 21.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!! 3.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i 3.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!! 27.tex:\lxDeclare[replace=$\expe$]{$e$}% 18.tex:\lxDeclare[replace=$\expe$]{$e$}% 18.tex:\lxDeclare[replace=$\iunit$]{$i$}% 18.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% 12.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi 12.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i 12.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e 32.tex:\lxDeclare[replace=$\expe$]{$e$}% 32.tex:\lxDeclare[replace=$\iunit$]{$i$}% 32.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% 17.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi 17.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i 11.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi 11.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i 11.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!! 30.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi 30.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i 30.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!! 20.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi 20.tex:\lxDeclare[replace=$\iunit$]{$i$}% Imaginary i 20.tex:\lxDeclare[replace=$\expe$]{$e$}% Exponential e, except with subscript!!! 34.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% Circular pi 23.tex:\lxDeclare[replace=$\expe$]{$e$}% 23.tex:\lxDeclare[replace=$\iunit$]{$i$}% 23.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% 25.tex:\lxDeclare[replace=$\cpi$]{$\pi$}% 25.tex:\lxDeclare[replace=$\iunit$]{$i$}% 25.tex:\lxDeclare[replace=$\expe$]{$e$}% ``` sublime replacement command ``` [ { "caption": "Reg Replace: DLMF code2Num", "command": "reg_replace", "args": { "replacements": [ "replaceAL", "replaceAS", "replaceNM", "replaceEF", "replaceGA", "replaceEX", "replaceER", "replaceIG", "replaceAI", "replaceBS", "replaceST", "replacePC", "replaceCH", "replaceLE", "replaceHY", "replaceGH", "replaceQH", "replaceOP", "replaceEL", "replaceTH", "replaceMT", "replaceJA", "replaceWE", "replaceBP", "replaceZE", "replaceCM", "replaceNT", "replaceMA", "replaceLA", "replaceSW", "replaceHE", "replacePT", "replaceCW", "replaceTJ", "replaceFM", "replaceIC" ] } } ] { "replacements": { "replaceAL": {"find":"AL", "replace":"1"}, "replaceAS": {"find":"AS", "replace":"2"}, "replaceNM": {"find":"NM", "replace":"3"}, "replaceEF": {"find":"EF", "replace":"4"}, "replaceGA": {"find":"GA", "replace":"5"}, "replaceEX": {"find":"EX", "replace":"6"}, "replaceER": {"find":"ER", "replace":"7"}, "replaceIG": {"find":"IG", "replace":"8"}, "replaceAI": {"find":"AI", "replace":"9"}, "replaceBS": {"find":"BS", "replace":"10"}, "replaceST": {"find":"ST", "replace":"11"}, "replacePC": {"find":"PC", "replace":"12"}, "replaceCH": {"find":"CH", "replace":"13"}, "replaceLE": {"find":"LE", "replace":"14"}, "replaceHY": {"find":"HY", "replace":"15"}, "replaceGH": {"find":"GH", "replace":"16"}, "replaceQH": {"find":"QH", "replace":"17"}, "replaceOP": {"find":"OP", "replace":"18"}, "replaceEL": {"find":"EL", "replace":"19"}, "replaceTH": {"find":"TH", "replace":"20"}, "replaceMT": {"find":"MT", "replace":"21"}, "replaceJA": {"find":"JA", "replace":"22"}, "replaceWE": {"find":"WE", "replace":"23"}, "replaceBP": {"find":"BP", "replace":"24"}, "replaceZE": {"find":"ZE", "replace":"25"}, "replaceCM": {"find":"CM", "replace":"26"}, "replaceNT": {"find":"NT", "replace":"27"}, "replaceMA": {"find":"MA", "replace":"28"}, "replaceLA": {"find":"LA", "replace":"29"}, "replaceSW": {"find":"SW", "replace":"30"}, "replaceHE": {"find":"HE", "replace":"31"}, "replacePT": {"find":"PT", "replace":"32"}, "replaceCW": {"find":"CW", "replace":"33"}, "replaceTJ": {"find":"TJ", "replace":"34"}, "replaceFM": {"find":"FM", "replace":"35"}, "replaceIC": {"find":"IC", "replace":"36"}} } ``` inspiration https://css-tricks.com/run-multiple-find-replace-commands-sublime-text/
@AndreG-P To answer your question whether it is realistic to plan with these improvements for the JCDL paper, I'd say mostly likely no, for two reasons: (1) I will be off to other commitments for several weeks, and (2) the problem is still open-ended with no guarantee of 100%% success yet.
@physikerwelt thank you.
@abdouyoussef I agree. This simply implies that we need to find the best way to work with the current data.
@physikerwelt Thus I would say we only consider the cases where it states: change of variable (locally)
. Since these seems to be the only cases where a variable is clearly defined for the following expressions.
I think symbols-used:
contains the information we are looking for. It contains: $$z$$: complex variable
and $$\zeta(z)$$: change of variable (locally)
. So could you add maybe a \symbolUsed{...}
to the test cases? I can ignore most of them but if I find something like $$z$$: complex variable
I add it as a constraint (in this case z \in \Complex
) and if I find change of variable (locally)
I know I have consider a replacement in the following test cases of the same section.
sure will do.
I modified the code so that:
Below is a sample. Let me know if you would like me to upload this modified dataset (over-riding the one currently posted).
Here is a sample of an equation-block in the dataset: Equation: equation-number: 10.7.4 permalink: http://dlmf.nist.gov/10.7.E4 xml-id: C10.S7.E4 tex: $$Y_{\nu}\left(z\right)\sim-(1/\pi)\Gamma\left(\nu\right)(\tfrac{1}{2}z)^{-\nu},$$ content-tex: $$\BesselY{\nu}@{z}\asympeq-(1/\pi)\EulerGamma@{\nu}(\tfrac{1}{2}z)^{-\nu},$$
constraints: tex: $$\Re\nu>0$$ or $$\nu=-\tfrac{1}{2},-\tfrac{3}{2},-\tfrac{5}{2},\ldots$$, content-tex: $$\realpart@@{\nu}>0$$ or $$\nu=-\tfrac{1}{2},-\tfrac{3}{2},-\tfrac{5}{2},\ldots$$,
symbols-used: symbol: tex: $$Y_{\NVar{\nu}}\left(\NVar{z}\right)$$ content-tex: $$\BesselY{\NVar{\nu}}@{\NVar{z}}$$ idref: C10.S2.E3 meaning: Bessel function of the second kind symbol: tex: $$\Gamma\left(\NVar{z}\right)$$ content-tex: $$\EulerGamma@{\NVar{z}}$$ idref: C5.S2.E1 meaning: gamma function symbol: tex: $$\sim$$ content-tex: $$\asympeq$$ idref: C2.S1.E1 meaning: asymptotic equality symbol: tex: $$\pi$$ content-tex: $$\cpi$$ idref: C3.S12.E1 meaning: the ratio of the circumference of a circle to its diameter symbol: tex: $$\Re$$ content-tex: $$\realpart@@$$ idref: C1.S9.E2 meaning: real part symbol: tex: $$z$$ idref: C10.S1.p2.t1.r4 meaning: complex variable symbol: tex: $$\nu$$ idref: C10.S1.p2.t1.r5 meaning: complex parameter
context: sentence-xmlid: C10.S7.SS1.p1.s1 sentence-num-in-section: 1 sentence-num-in-chapter: 109 sentence-num-in-corpus: 6508 para-xmlid: C10.S7.SS1.p1 para-num-of-sentences: 3 subsection-xmlid: C10.S7.SS1 subsection-title: section-xmlid: C10.S7 section-title: Limiting Forms chapter-xmlid: C10 chapter-title: Bessel Functions End-equation
Note that in the previous sample of an equation-block, the indentation was lost, but in the dataset, there is proper indentation.
@AndreG-P
@HowardCohl I remember you somehow replaced
k'
, for example in https://dlmf.nist.gov/22.2#p2. Was this information also given in\lxDeclare
and can we perform this replacement now also? I also noticed that the prime comes after the arguments now in this section. I'm not sure, but I think in our old data two years ago the position of the primes where different. For example, it seemsK(k)'
is not defined now...
Good memory! Ok. This is what is going on:
In JA.tex
there is the following global replacements.
JA.tex:\lxDeclare[replace=$\compellintKk@@{k}$]{$K$}%
JA.tex:\lxDeclare[replace=$\ccompellintKk@@{k}$]{$K'$}%
Also, in JA.tex
, MA.tex
, LA.tex
, EL.tex
(when you are in math mode)
{k'}^2
represents 1-k^2
{k'}^{2m}
represents (1-k^2)^m
{k'}^{2m+2}
represents (1-k^2)^{m+1}
k'
represents \sqrt{1-k^2}
In JA.tex
in §22.7(i) Descending Landen Transformation http://dlmf.nist.gov/22.7.i
k_1
represents \frac{1-k'}{1+k'}
In JA.tex
in §22.7(ii) Ascending Landen Transformation http://dlmf.nist.gov/22.7.ii
k_2
represents \frac{2\sqrt{k}}{1+k}
k'_2
represents \frac{1-k}{1+k}
In JA.tex
in §22.17(i) Real or Purely Imaginary Moduli:
k_1
which represents \frac{k}{\sqrt{1+k^2}}
k_1k'_1
which represents \frac{k}{1+k^2}
k'_1
represents \frac{1}{1+k^2}
I am actually working right now on improving the linking for variables of this type. Maybe when I am done you can get the data from me?
I think everything else of this type is mostly encapsulated by the metadata in the i-boxes.
2. regarding 9.6.2. We have the following information https://github.com/abdouyoussef/MLP/blob/147ab2d54e98567690ec52e86374d0e29acfeaab/MathNLP/ReferenceData/Datasets/dlmf/dlmf-chapters-OneTextBlockPerEquation/9/9.6.txt#L25-L54 which can be exposed to the other data format. Would any of these lines help with your problem?
@physikerwelt What happened to the semantic LaTeX in that link?
@AndreG-P To answer your question whether it is realistic to plan with these improvements for the JCDL paper, I'd say mostly likely no, for two reasons: (1) I will be off to other commitments for several weeks, and (2) the problem is still open-ended with no guarantee of 100%% success yet.
Perhaps we should have stuck with my implementation.
Perhaps we should have stuck with my implementation.
@HowardCohl Yes, perhaps... But it looked like we don't have a chance to update your extractions and it would be better to move to Abdou's data. That was Moritz's initial motivation.
@physikerwelt What do you think? If we have a chance to update Howard's program, it might be better?
@HowardCohl
I am actually working right now on improving the linking for variables of this type. Maybe when I am done you can get the data from me?
I think there is still a bug in your code. When you send me formulas-3.txt
there are weird artifacts in the data. For example Line 3723 contains:
\compellint\CompEllIntKk@@{k}k@{k}
I think this is related to your k'
replacements.
Besides that, I would agree to maybe use your data. What do you think about your schedule? When do you plan to have a good version of the dataset?
Perhaps we should have stuck with my implementation.
@HowardCohl Yes, perhaps... But it looked like we don't have a chance to update your extractions and it would be better to move to Abdou's data. That was Moritz's initial motivation.
@physikerwelt What do you think? If we have a chance to update Howard's program, it might be better?
Of course I can update it. That is easy. In fact, I already do all the replacements. Clearly Abdou's program is better in the long run, but in the short run, I don't know. But I can easily help and am ready to help.
@HowardCohl
I am actually working right now on improving the linking for variables of this type. Maybe when I am done you can get the data from me?
I think there is still a bug in your code. When you send me
formulas-3.txt
there are weird artifacts in the data. For example Line 3723 contains:\compellint\CompEllIntKk@@{k}k@{k}
I think this is related to your
k'
replacements.Besides that, I would agree to maybe use your data. What do you think about your schedule? When do you plan to have a good version of the dataset?
I can easily look at this tomorrow. I just thought since we were using Abdou's program, there was no point. Just let me know. Perhaps it wouldn't be too bad to have two alternative datasets each with pluses and minus. Clearly mine has some minuses. :)
@HowardCohl @abdouyoussef @physikerwelt I had quite a long discussion with Moritz and we think the best option is a hybrid approach of both datasets. We will use Abdou's dataset but manually define some replacement rules in extra config files. These replacement rules can be grouped into 3 three categories
lxDeclare
replacements in entire sections (e.g., e => \expe
) defined in the list that Howard posted above\zeta
in DLMF 9.6 and related sectionsk
, k'
, k_1
and so on as discussed above.Hence, we still using the better long term approach and rely on Abdou's data, but quick and dirty fixing some of the most prominent problems. We believe that the best solution to finish everything for the JCDL.
Furthermore, I will only evaluate expressions that have content LaTeX (e.g., there is at least one semantic macro in the expression). This seems to be a very effective approach to filter out functions that are kind of meaningless for evaluation via CAS.
I will work on implementing all this stuff in the following days. The symbolic evaluation works on Maple and Mathematica. The numerical tests are not yet updated to work with Mathematica. I will also work on that now. I hope to finish all this by the end of next week already.
@AndreG-P I updated the dataset. Now it contains the list of symbols used and symbols defined. The second argument is a unique ID. In some cases, the id does not link to the definition directly but can be massaged to link to the definition. For example C9.S6.XMD1.m1adec
references C9.S6.XMD1.m1dec
one could now consider resolving this links (by omitting the last block of the id). However, the hard part will be the following:
If one knows that for example \zeta
was defined in the formula \zeta=\tfrac{2}{3}z^{3/2},
how can one the replacement rule for \zeta
? One can either (as @abdouyoussef suggested earlier) translate the whole expression as an assumption and pass it to the simplify Mathematica command or develop heuristics to extract the definitions. Also, note that https://github.com/physikerwelt/MLP/blob/eqLine2/MathNLP/ReferenceData/Datasets/dlmf/dlmf-chapters-OneLinePerEquation/9/9.6.txt#L41 is very likely to a bug in the DLMF.
@abdouyoussef I think it would be great if you could share the new code and dataset. The old dataset will still be available from the git history.
@physikerwelt
If one knows that, for example,
\zeta
was defined in the formula\zeta=\tfrac{2}{3}z^{3/2},
how can one the replacement rule for\zeta
? One can either (as @abdouyoussef suggested earlier) translate the whole expression as an assumption and pass it to the simplify Mathematica command or develop heuristics to extract the definitions.
This is of course a good idea, the problem is, that it is very hard to generalize among multiple CAS and that it doesn't work in assumptions. For example:
sin(1/z) - sin(x)
This cannot be simplified by Maple/Mathematica unless I define
z := 1/x; (Maple and Mathematica).
The following does not work! Neither in Maple nor in Mathematica:
simplify( sin(1/z) - sin(x) ) assuming z == x/1;
FullSimplify[ Sin[Divide[1,z]] - Sin[x], z == Divide[1,z] ]
However, if I define z := 1/x
, it becomes critical to unset z
again.
Anyway, it is way more easy to perform the replacements on the strings.
I uploaded two compressed files: dlmf-chapters-OneTextBlockPerEquation-detailed.zip, and dlmf-chapters-OneTextBlockPerMathExpr-detailed.zip.
The first zip file contains the dlmf files consisting of equation blocks, where the constraints are in both LateX and semantic-LateX (where available), and the symbols defined/used are more detailed; for each symbol, there is a mini-block titled "symbol:" and has several lines showing the tex representation, content-tex representation (if available), the idref (i.e., the DLMF ID of the equation (or table entry) that defines that symbol, and the meaning of that symbol.
The second zip file has not only the same equation blocks as in the previous zip file, but also math-expression blocks.
I also updated the software, especially the file that Moritz created for generating equations, one line per equation.
@physikerwelt @AndreG-P @abdouyoussef
I don't know if this is the right place to put this issue, but now that we have considered using Abdou's program to massage the DLMF data, you need to think about some of the processing steps that I used in generating the original dataset. Since you are starting with the
XML
, I suppose you don't need to worry about removing all comment lines, even those in formulae.These are some things which you might need to think about:
e->\expe
,i->\iunit
,\pi->\cpi
(perhaps this information is already available in the XML, also there might be other replacements which are correctly handled);\pm
and \pm` formulas into two separate formulas such as http://dlmf.nist.gov/10.15.E1;{equationgroup}
commands such as http://dlmf.nist.gov/10.6.E1;{equationmix}
commands such as http://dlmf.nist.gov/10.9.E18.There might be other things I am missing.