Closed Alex-Jordan closed 5 years ago
\x85
is a verbatim delimiter? Not sure I understand what a verbose delimiter is, or even exactly what is going on here. Where and how does the substitution of a tilde change the output? All I can infer so far is that \x85
is unwanted.m
tag? It was not too hard to figure out, but I did wonder what was going on as I read it.Visual review only. I can try to test later today, but that may be my last easy opportunity.
Yes, "verbatim", sorry. That's what happens when I tidy up too late at night.
Davide C dealt with "dangerous" characters in string answers printed in math mode in the following way. Put them in \verb
here
. You can see this at line 598 and then at line 602 here:
https://github.com/openwebwork/pg/blob/8a089edceb5d3b36500bac47ef3c2daeec10e0e4/macros/PGstringevaluators.pl
The Python further down complains about encoding if we get this character. I'm hoping to avoid rebuilding all that Python to make it OK with characters in that range, all just for this one thing.
Instead of a blanket change of \x85 to tilde, I will do a more targeted regex replace on \verb
stuff
. And I'll try to design it to use tilde only if "stuff" has no tilde. Maybe I'll just grab the lowest printable character not appearing in "stuff".
Is your second comment about the problem I added that shows the dangerous XML characters?
Ok, I get it now (I think). Let's definitely target \verb
so we don't impact anything else.
What about iterating through a list of "unlikely" characters, until one is not present? Like ~ | ?
, then try more "normal" characters at the end of the list (a-z, A-Z). Not fool-proof, but close enough?
I think Python is now much pickier about strings and Unicode. A fix could be just being careful about specifying an encoding, though \x85
sounds like a hack, so maybe best to get rid of it.
I was talking about teh difference between having correct_ans
and correct_ans_latex_string
. One gets an m
tag, the other doesn't. Tell the reader of the Python beforehand that is the purpose of code leading up to new line 545.
Great - looking good - I'll test shortly.
Do we have a test of the verbatim character substitution? Could the XML character test be expanded? Maybe provoke something like an \x40-ish
to be employed? I ewon't delay testing to wait on this, but it would be nice to have before merging.
OK, I added explanation about correct_ans_latex_string versus correct_ans.
And now there is a very targeted replacement of \verb�stuff�
. It starts the candidate delimiter at chr(33) (exclamation point) and works its way up looking for an unused character to become the delimiter. It skips over asterisk, since \verb*
is its own command. This post says all other things are OK. But it also skips over the five XML characters to be safe. (Indeed, I tested that ampersand makes for a bad delimiter.)
If it gets to char (126) you have a ridiculous string in your verbatim answer, and your verbatim string is replaced with a message that says so.
And I just tossed out semicolon as a delimiter too. I wondered about things like \verb;M&M;
. No testing if that would arise or not work out, but it couldn't hurt to be safe.
Verbatim text will drive you crazy, no? I wonder if we will ever see
!A verbatim string contained all ASCII characters from 33 through 126!
;-)
I'm off to a meeting, won't be working on this for the rest of the day. You can see the exercise at line 635 of the sample chapter where the XML char test is. It should be straightforward to add more answers that test more characters, if you get to a point where you want to finish this.
If you can't finish today and/or discover more issues, I don't think there's a rush for this.
Squeezed in an example while I'm waiting for this meeting to start. (Also fixed a small bug in that example that would not have been noticed until the problem was used while logged in to the WW server.)
In the extraction file, with the new exercise, I see
The first answer is: <less /><greater /><ampersand />'";
I should think the single quote mark should be '
on the WW server side? I don't think we will get bit by this, if all attributes coming back from the SimpleXML routine are wrapped in double-quotes. But maybe WW wants/needs a fix?
The answer
for the second question is
<p><m>\verb[!#$%()*+,-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[</m></p>
The delimiters don't match. Is something off-by-one? I can't see just why this would be happening.
Are you expecting \verb[...]
? In this context, it's supposed to be \verb[...[
Right. Confused...
On 12/13/18 3:14 PM, Alex Jordan wrote:
Are you expecting |\verb[...]|? In this context, it's supposed to be |\verb[...[|
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rbeezer/mathbook/pull/982#issuecomment-447155642, or mute the thread https://github.com/notifications/unsubscribe-auth/ABy2ck3PvJAJ-1nH2EmOM4HuoubdVVJsks5u4t9GgaJpZM4ZRGB_.
So an author puts "dangerous" characters within pg-code
and they migrate to a WW answer. As a string, it gets wrapped in a LaTeX \verb
construction. If it is part of a math answer it gets wrapped in a PTX m
tag as an eventual static/exercise/answer
.
We don't exercise too much control over the contents of m
, so authors are free to make stuff that does not compile. Not much we can do here.
In this situation, the \verb
means the text is rendered in a monospace font - certainly in PDF, and apparently in live HTML.
Would it not be better to wrap the whole thing in \text{}
and escape characters like #, &, %
, etc. Isn't that what we would tell an author to do with a chunk of text inside an m
element?
It feels to me like this \verb
device is just an end-run around doing a little bit of work to manage characters authored in Perl migrating to their proper equivalents in LaTeX? Would \verb
suffice to signal where this care should be taken? Am I mis-understanding something (again)?
In the extraction file, with the new exercise, I see
The first answer is:
<less /><greater /><ampersand />'";
I should think the single quote mark should be ' on the WW server side? I don't think we will get bit by this, if all attributes coming back from the SimpleXML routine are wrapped in double-quotes. But maybe WW wants/needs a fix?
This part is handled by the WW server. There are two steps. One is when the PGML parser sees things like a quotes pair and makes a note to make these smart quotes. Probably I have that set to come out as q
in PTX output. Then after that, there is a cleanup that turns things like <
into <less />
because we used to need that. With your recent work, I can tone down this cleanup step.
Davide C intentionally changed \text{stuff}
to \verb�stuff
at one point:
https://github.com/openwebwork/pg/commit/7ae713871238e2452f333937f79a0685848df123
So I don't want to mess with that at its source. We could look at the mbx script changing that \verb
to \text
and experimenting with all the escaping of characters. For some reason \text{}
was insufficient, and we'd be looking to see if escaping was enough to counter that (in both tex and MathJax).
It looks like at least part of the issue was a student typing in something like (12}
as their answer, and \text{}
receiving that and breaking. We don't have to worry about that, since we grab the answer hash before there is a student answer in it.
Sorry a bit disjointed on this one. In the extraction file:
<p><m>\verb!<>&'";!</m></p>
I presume that is coming from WW-server-generated XML. Then that lone single quote probably should be escaped.
I don't think it matters where the \verb
gets unwound, so long as the merge file is good PTX. You can search mathbook-latex.xsl
for name="text-processing"
to see how it is done for the LaTeX conversion. Unfortunately, i don't think that template can be used without importing too much other stuff.
Is this ready to be 1.0? Should we merge now, and refine as we go? You know, we can be the chicken and wait on the egg?
Let's not merge yet. There are at least two other closely related things I'd like to get right first/at the same time. One is in a post I just made about verbatim, pre, and code. I'll post the other thing too now. And there isn't a rush. This can wait several weeks or whatever it takes.
I keep finding something or other that tells me that \verb
is necessary. Then I think harder, and I find a way out of it and make progress with \text
. But now I've hit an issue I can't escape.
This whole thing with either \verb
or \text
only comes up within math. That is important, because I think none of your tools for treating naked characters differently in different output modes can be applied.
Now, with MathJax, \text{}
does not expand macros inside the argument, with a few exceptions. For example, one of the exceptions is that \text{\}}
will print a }
. But \text{\%}
really will print \%
. OK, if you just want the percent sign, \text{%}
will print %
. But if we have that in PTX source, then it will break the PDF production.
There's no way around this with making some new macro like \percent
because, again, in MathJax that macro will not be expanded. And again, since this is all inside math, we can't use tools that would turn \text{%}
in source into one thing for HTML and another for PDF.
So we have a situation where MathJax is not faithful to the original LaTeX. This stems from MathJax wanting to support \text
but not to the extent that LaTeX supports it. Either we can
\verb
. Or\text
but throw an error/warning if the string contains certain characters. %
is one, but the issue is similar with #
, ^
, and probably more.\text
if they are absent.I can imagine real answers that are text strings that actually use %
, so 2 seems like the worst choice. I can't decide between the consistency and simplicity of 1 and the aesthetics of 3.
I'm pretty happy with how things are now (with a lot of unpushed changes).
There are a few changes to webwork2 and pg to go along with this, and I just opened PRs for them to their develop branches. While I wait for those to be merged, I'll work on documenting all of this in the author guide, etc.
Then I'll recommit all the work here, plus unpushed work, plus documentation, and open a new PR with the commits more sensibly organized.
It will remain for you to intercept bare apostrophes in author source and replace them with [$APOS]
. But [$APOS]
will be there waiting for you once the pg PR is merged.
OK, I force pushed again after reorganizing commits. Known issues (that I think are outside the scope of this PR):
\char`\{
eating up a space that follows (a mathbook-latex.xsl
problem)[$APOS]
. Seems to be no good way to implement that, and we could live with seeing some <rsq/>
s in the returned PTX. (There are certainly other things there that are clearly not written by a human.I just rebased and force pushed this. The issue with spaces being eaten is gone, thanks to your work on that. There is still the matter of multistage problems. And the apostrophe matter, which I think we should live with as is. But I think that this could be merged, pending your review (especially of the Python).
If you are like me, it has been too long since working with this. I forgot to change the server from -ptx to -dev and I was scratching my head. Note that -ptx will do some things wrong in conjunction with the new changes to extract-pg-common. So at the same time this is merged, I need to bring some upgrades from -dev over to -ptx.
- Issue posted about in the forums with [$APOS]. Seems to be no good way to implement that, and we could live with seeing some
<rsq/>
's in the returned PTX. (There are certainly other things there that are clearly not written by a human.
At 30c94eb764fcb3b0250c4dee4abe422237c4a4f2 apostrophes in text are being trapped and made in [$APOS]
. Suprisingly few of those actually within a problem itself in sample chapter, and coincidentally, only in the formatting test. I did not rebuild samples at website.
The apostrophes (which were <rsq/>
) are now missing in static
. Hopefully I didn't do things out-of-order. I'll review PR soon.
- Issue posted about in the forums with
\char\
{` eating up a space that follows (a mathbook-latex.xsl problem)
That's been resolved at 953c6b6ade08390d367d5f925b121dbe213b47c5, it was only for verbatim text and I think the switch to \texttt{}
made it unnecessary.
I need to use this branch for making an answers manual for ORCCA, so I rebased from dev. I went ahead and force-pushed, so just a heads up about that.
Found one actual issue to fix (that goes back to a386c52) and one improvement in functionality of when we abort ww extraction. So two more small commits here.
Finally getting around to reviewing this. Sorry for the long wait.
cd
and cline
.
extract-pg.xsl
and extract-pg-ptx.xsl
, run by themselves, produce bare characters, since basically the text elements just go through the "sanitize-text"
utility template (verified)<dollar/>
and friends. So we could kill two birds with one stone: 100% drop the conversion of the 10 LaTeX characters (and more). These are not deprecated yet, but careful code is in place and has not caused any problems (yet).<dollar/>
behaving similarly during builds for the website, so perhaps that will be addressed here.On the first point, yes, let's add seeds to a future PR.
On the second, before I think too hard, did you change the server to -dev? I sort of expect what you describe with -ptx. I would want to coordinate you merging this PR with me bringing the upgrades from -dev to -ptx.
Sorry - yes, I was on -ptx
. Lost track of what I was doing. I'll try another run tomorrow.
OK, finally got this configured right for testing. Looks good. (But let's get seeds to not change when content is added!)
<!-- PTX:WARNING: PGML wanted to indent here. -->
Are these necessary? There are tons of them and I'm not sure about the necessity. Also, better to not have them floating around inside p
as they could easily disrupt some things like whitespace handling if the code is a bit too loose (they could be matched inadvertently).\'
in the apostrophe testing. Seems to originate in "static"? (But let's get seeds to not change when content is added!)
Well, I could hard code the seeds into source. Using the ones that are used prior to this PR. I think that I would not to this for the first several subsections though, to avoid giving the impression that seeds are required to be set. Is that the desired solution?
<!-- PTX:WARNING: PGML wanted to indent here. -->
Are these necessary?
I thought to leave a little note about each thing that PGML wants to do but we cannot respect: indentation, text alignment, headings, and horizontal rules. With indentation it is admittedly much harder to understand why it would matter. With the other three, it's possible there was some significance that is now lost.
So you feel that it is not worth the comment? These should never be floating around inside a p by the way, but rather always be at the start of a p. Does it change anything if they are moved to right before the p instead? I could arrange that.
Indentation is going to be there for all problems, including PTX-authored. The other three things are only going to be present in an OPL problem. Maybe nix this for indentation but keep it for the other things?
LaTeX -> PDF and I get lots of \' in the apostrophe testing. Seems to originate in "static"?
Either you jumped the gun with 30c94eb, or I jumped the gun green-lighting you to do that. Eventually we reasoned it out that we will prefer to live with the <rsq/>
s that we had before. See the post starting here and then the apostrophe-related posts as the thread evolves.
Let me know if you concur about seeds and the PGML commentary, and I guess revert most of 30c94eb, and then remember that we need to coordinate merging this with me applying some changes to the -ptx server.
Seeds can change now or later. It won't hold up this, but it will increase the reliability of testing before/after. I understand not doing it early. But maybe it can start quickly, and include a disclaimer? The sample performs two contradictory roles: documentation and testing. We can't always have both. Maybe we should "permid" it as well, just to get stable ids.
I'll argue that indentation of a "p" is presentation. So I'd vote for dropping that one entirely. If the remaining are all from OPL, then that makes a lot of sense to me. However they were authored is beyond any thought of PTX, so it seems right to record an author's intent that we can't recover properly.
Any extraneous nodes at start of "p" or end of "p", or next to display items (math, lists, and "cd"), are places where delicate stuff happens, and eventually I should audit all the preceding/following constructions. So at teh start (inside) makes me nervous. For now, the best thing to do is place it just before a "p" starts, perhaps on its own line. That interstitial space is safer.
I'll pursue 30c94eb confusion in a bit.
What is the timing/stategy for this? Should it merge ASAP, or do we need to coordinate with -ptx server, or a general WW release?
I will likely combine this and break it out (like the seemingly distinct work on answer blank width, and sample chapter edits, etc) and then put your name on teh pieces. So it will be a small, but one-time only, job to reorganize.
Rob
What is the timing/strategy for this?
I think like this.
Now that leaves both -dev and -ptx on WeBWorK "develop" branches, plus a few extra changes that will be in an open pull request to WeBWorK's "develop". It will mean that if someone does not want to use the AIM servers, they will need to put their server in about the same state. Before we had the AIM servers, I did not like the idea of having features that didn't work with the current "master" of WeBWorK. But the AIM servers mitigate the concerns, and waiting for WW 2.15 for releasing this feature strikes me as far too long a wait.
So I think I will manage -ptx to be on the WW "develop" branch, plus a few extra things that are in the pipeline to join the WW "develop" branch. From time to time, "develop" branch features will have bugs. It is very unlikely that we will care, because the bugs are very likely to be with other aspects of WW that we do not use, such as file management within an actual WW course.
I will likely combine this and break it out (like the seemingly distinct work on answer blank width, and sample chapter edits, etc) and then put your name on teh pieces. So it will be a small, but one-time only, job to reorganize.
OK, but do you see that it is already in 8 separate commits? For example the answer blank simplification is in its own commit.
Perfect. Do we need to announce "use -ptx server while we evolve functionality"? (Maybe we did?)
I only see 8 commits right now. Not how each one is organized. ;-)
I have about 8 conversations going on right now and want to patrol title-ending periods in HTML while I have a chunk of time. ;-)
If certain commits are atomic, that will make consolidation much, much easier.
So we just need to sort out 30c94eb? Ihave that on a sticky note. I.e., high priority.
Rob
On 1/23/19 10:20 AM, Alex Jordan wrote:
What is the timing/strategy for this?
I think like this.
- I make some changes to -dev to drop noting the indentation, and move the other notes outside the p.
- I could also do seeds in the sample-chapter.
- You review and signal me that things are ready to merge.
- At roughly the same time, within a few hours of each other, you merge, and I port the -dev edits over to -ptx.
Now that leaves both -dev and -ptx on WeBWorK "develop" branches, plus a few extra changes that will be in an open pull request to WeBWorK's "develop". It will mean that if someone does not want to use the AIM servers, they will need to put their server in about the same state. Before we had the AIM servers, I did not like the idea of having features that didn't work with the current "master" of WeBWorK. But the AIM servers mitigate the concerns, and waiting for WW 2.15 for releasing this feature strikes me as far too long a wait.
So I think I will manage -ptx to be on the WW "develop" branch, plus a few extra things that are in the pipeline to join the WW "develop" branch. From time to time, "develop" branch features will have bugs. It is very unlikely that we will care, because the bugs are very likely to be with other aspects of WW that we do not use, such as file management within an actual WW course.
I will likely combine this and break it out (like the seemingly distinct work on answer blank width, and sample chapter edits, etc) and then put your name on teh pieces. So it will be a small, but one-time only, job to reorganize.
OK, but do you see that it is already in 8 separate commits? For example the answer blank simplification is in its own commit.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rbeezer/mathbook/pull/982#issuecomment-456910880, or mute the thread https://github.com/notifications/unsubscribe-auth/ABy2cqL-p5Ma4JEmm66-FKOENyoSW_j6ks5vGKfRgaJpZM4ZRGB_.
With 30c94eb, it would be OK to just manually undo it as a new commit. Except in one place you fixed a typo in a comment, so that could stay. But it's a very small commit. You'd revert to just not singling out an apostrophe as a character to do special management with.
We will announce for sure that answers are now harvested automatically.
I'm unsure if announcing about using -ptx is the right thing, or if it should just be in one of the guides.
Alex - I'm going to revert the one commit. Will un-revert later, and just let typo come and go, I think. Haven't looked. Then see what consolidation to do on the branch.
I'm concerned about folks using their own server. I'd like to tell them where they need to be on -announce.
I plan to be ready to merge, but can sit tight until you are ready. Then we can coordinate. Seeds can be independent.
Rob
Dear Rob,
I have made the PTX:WARNING edits to the -dev server. I will wait to do seeds. If it's OK with you, when this is all done I'll just set seeds without regard to whatever seed is used now for problems. So with that one commit, the diff of output might be large. But the diff of PTX source will just be seed-setting
Try processing the sample-chapter again now, remembering to set the server to -dev. The only differences from last time should be PTX:WARNING~s gone for indentation, and the ones about center alignment should now be outside of any p.
Try processing the sample-chapter with the server variable being SERVER = "(https://math.webwork.rochester.edu,daemon_course,daemon,daemon,daemon)"
. That is Mike's server. It is version out-of-the-box 2.14. You won't get answers, but you still get an extraction. See if it is acceptable. This is to check that merging this PR will not break things for people not using an AIM server.
When I use Mike's server, I can make the PDF. I get no errors, but lots of warnings about using things like <dollar>
inside code. And there is corresponding funkiness in the output. It's hard to imagine this mattering in practice. How rare are these WW problems with code inside that has special characters? Ask that question for PTX-authored. And then again for OPL, but with OPL it will only matter when the problem is using PGML.
There are other differences from doing this with Mike's server, because he has not configured the host course to be used by PTX. So for example you see notes about an instructor preview. The "standard" setup for a host course to process PTX does something to nullify these.
To my chagrin, I see now that we have already made things incompatible with 2.13. Maybe I knew this at some point in the past, but I had forgotten. You can try the above with SERVER = http://math.mc.edu
to use JT's server where he has set up an "anonymous" PTX processing host course. Even on dev branch without this PR, it fails. Because with 2.13, for example the "ptx" outputformat hasn't been put in place. And probably many other things. Probably I said goodby to 2.13 when we switched from the tex-based method to getting WW to output actual PTX.
After you review this all, either signal me you are ready to merge (and we live with the <dollar>
stuff for people using their own server until 2.15 comes out) or we can keep talking and see if we can work around that.
I have made the PTX:WARNING edits to the -dev server.
Great!
I'll just set seeds without regard to whatever seed is used now for problems.
Sounds good.
When I use Mike's server, I can make the PDF. I get no errors, but lots of warnings about using things like
inside code. And there is corresponding funkiness in the output. It's hard to imagine this mattering in practice. How rare are these WW problems with code inside that has special characters? Ask that question for PTX-authored. And then again for OPL, but with OPL it will only matter when the problem is using PGML.
Well, my additude is not how rare something might be, but instead how bulletproof we can be. Just a few mess-ups lead to low opinions. OPL is OPL, nothing we can do and we have to communicate that fact.
There are other differences from doing this with Mike's server, because he has not configured the host course to be used by PTX. So for example you see notes about an instructor preview. The "standard" setup for a host course to process PTX does something to nullify these.
So why are we testing this if it is not configured? I'm missing the point here, it seems.
To my chagrin, I see now that we have already made things incompatible with 2.13. Maybe I knew this at some point in the past, but I had forgotten. You can try the above with SERVER = http://math.mc.edu to use JT's server where he has set up an "anonymous" PTX processing host course. Even on dev branch without this PR, it fails. Because with 2.13, for example the "ptx" outputformat hasn't been put in place. And probably many other things. Probably I said goodby to 2.13 when we switched from the tex-based method to getting WW to output actual PTX.
Is it possible for PTX output to be built on one server (AIM) but with hosting talking to a different server (Mississippi)?
So why are we testing this if it is not configured? I'm missing the point here, it seems.
If I knew of a fully PTX-configured WW server with no customization of version 2.14, I'd use that for this. Mike's is the closest I know of. The things he has not done are the things here in the author guide and it is predictable what the effects are from not doing those things.
Anyway, the main point of trying that was just to see that things don't go completely snafu, like they do if trying a 2.13 server. It's evidence you still get good returns, except for certain things inside verbatim-like places.
Is it possible for PTX output to be built on one server (AIM) but with hosting talking to a different server (Mississippi)?
Of course, you risk things being different. One server may have edited problem code (for an OPL problem) or customized something about how PG produces output. One server may have a certain OPL problem and the other doesn't have it at all. So that's an issue.
But there are reasons to consider enabling a publisher to do this some day. Maybe things reach a point where AIM simply cannot be rendering all the live problems because of the traffic. But it is still good for various reasons to let AIM do the extraction.
Alright, tested, and ready to merge. Thanks for your patience waiting for this one. Let me know about timing. Likely any evening (including Wednesday) would be a good time if we need to coordinate.
SERVER
variable into Makefile.paths
? That'd make switching branches easier when testing. You could include as much documentation as necessary.OK, great. @davidfarmer just installed a needed perl module on -ptx. I believe I've made -ptx (close enough to) identical to -dev. If you are ready, you could do one quick test by switching the SERVER to -ptx. I'm not in a position to test myself or I would, sorry.
If it passes the test, merge into dev, and I can post to -announce about answers being available now. (And also mention the issues with using a 2.14 server.)
I've run the sample chapter on -ptx
and compared it to the same run earlier this afternoon, both with the PR active. Diff on the "extraction" file shows the answers are not there. Those are the only significant differences, so I think I must be using all the right combinations. But I've been confused before.
It looks like I did not correctly restart apache. It is difficult-ish to restart apache on that server, because of the permissions I have.
I was looking at raw output and not seeing the answers. I restarted apache "the right way" and now I do see the answers. So I'm optimistic it will work now :)
That's it. Merging now.
Monumental. Thanks for all the work on this one. A big step forward.
Go ahead and announce. Keep it concise and high-level, but do try to explain the server and versions situation. Maybe point to new documentation. Invite discussion on -dev and/or -support as you think appropriate.
Thanks!
In the sample chapter's Makefile, change the server to -dev. Then
make sample-chapter-extraction
should build awebwork-extraction.xml
file with answer elements for each problem.make sample-chapter-merge
,make sample-chapter-pdf
, andmake sample-chapter-html
should all make good things.Leaving the server as -ptx should still behave too, just with no answers.
We have the perennial chick and egg problem with WW development. Part of making this work was adding code to WW on -dev. I'll need to open a PR there to WW's develop branch, get it approved, then wait until it merges into WW's master. Or I could bring these changes into -ptx right now while the WW repo chugs along with its development more slowly. Not sure what to do, except that I will wait to open the PR to WW until you've had a chance to comment. (In case I need to do a little more there.)