Closed lynxlynxlynx closed 4 years ago
This looks approachable. I think I can solve it within a week. There are a few details though.
Depending on the availability of the required feature, we may need to use Pandoc as either:
It may happen that the user will need to follow some instructions to obtain the custom-built executable with the features we need.
Let me know what you think.
---
title: blablabla
---
We could also add a "module" line, but that would require you first read the function list for both (it's at the end), so let's leave this for later.
I can code up the index once the files are there ("jekyll collection"). Mimic the structure we have now:
GUIScript/functions/ <-- all the extracted files
GUIScript/ <-- all the static files + index
This is overkill, since not much markup is used. From a quick sampling: bullet lists (same syntax), bold (same), implicit (same) and explicit (same?) code blocks, links and headers. Just convert them in python.
I don't see any benefit, since another site just means more fragmentation and/or work.
So we are not using Pandoc in the end? I thought that was your proposition based on #10. Now your proposition is that we roll out a custom parser and pretty printer?
It's what I'm using for the website content transmogrification, but here there's no need, since there's just 2-3 search&replace calls to be done. The content is already structured, so there's no extra pretty printing needed either.
I see where this is going. You are suggesting that we can convert DokuWiki into Markdown via a handful of regular expressions.
Approaching structured data as if it were known to be a regular language (that is, with a belief that regular expressions are sufficient for manipulating it) is asking for trouble. I can see how it could be tempting to believe that Markdown and DokuWiki have the same structure and therefore only the decorations (that we assume are regular) need to be changed. Surely individual things, like //italic//
markers, can be replaced. But DokuWiki format is large enough to make doubt reasonable. For example, how would a regular expression know not to replace italic markers inside a code block? Your argument would go along the lines of «we do not use italics inside code blocks», but I would hate to bet that we never use anything identical to DokuWiki markup inside code blocks. It is unfounded and dangerous to believe that the subset of DokuWiki that we shall ever need to convert is simple enough for regular expressions to handle without corner cases.
The right approach is to use the existing, well supported solution, which is Pandoc. The source of pain is the menacing ghost of building a large program written in an unfamiliar language, with an obscure build system. If you think this pain outweighs the pain of dealing with regular expressions, then order me and I shall do your bidding, but I shall have to disclaim responsibility for italics inside code blocks.
In short, you should stop micromanaging and let me do the right thing.
I don't know if you're aware, but you come across as pretty hostile. There's no need for that. And if you don't want my opinion, then please don't ask for it.
If you want to over-engineer the solution, that's up to you, it's your time. This is not so critical that dependency issues would matter much. The problem space is much smaller than you think, since the docs were autogenerated, but since you want something more generic, just go for it. Perhaps pandoc will be enough; in my use for porting the content, it hasn't always produced good results, however for this subset of syntax, I think it'll do fine.
No hostility is intended. Please assume good intentions. I am sorry for any misunderstanding.
I came up with 2 ways of extracting doc-strings. See:
GemRBMethods
public and linking against GUIScript in good faith instead of this #include
hack.Interface
and whatever else needed to run a Python interpreter the way it currently runs in actual gaming experience.Seeing how our previous conversations uncovered certain differences of opinion, I am expecting that you will be grumpy about my aspirations to replace sed/awk hacks with a principled solution, and none of the above will get merged. But you are welcome to surprize me.
P. S. I gave Pandoc a spin, it works out of the box.
What I mean to express is that the problem of converting DokuWiki markup into Markdown is made trivial by Pandoc. At worst, some small patches to the latter may be required, which is within my power to produce. The real problem is that the architecture of GemRB does not easily allow to extract the doc-strings in the first place.
At this point I need to know which, if any, of the ways outlined above I can expect to be merged, and then I hope to move this conversation into a draft pull request for polish and review.
I went about considering how better to refactor GUIScript
to expose both the value and the type (length) of a method array. A problem is that the implicit size parameter makes it impossible to link to an array across translation units. One way to solve it is to define a method array as a static constant in a header, so that every translation unit including it received an identical hard-wired unchangeable copy. Otherwise the number of the entries — that is to say, the complete type of an array — may be put in a header, and then the value of the array can be linked against, but it is a bit inconvenient to have to adjust both the header and the definition whenever a method is added or dropped. Finally, the correct solution could be to embrace modern C++ features and use a container with template iterator interface — why are we stuck with C-style byte management in the first place?
On the other hand, we may be better off considering GUIScript
exclusively Python API focused and instead make it easy and cheap to run a Python interpreter. That direction is of larger scope, so I have not been looking there yet.
So what was to be a diff of about 10 lines is now one of 31 thousand and it's not even done yet. Impressive! :open_mouth:
I see three ways to keep extraction much simpler (besides the current 3 lines) and both do it all in python. You already got the introspection bits, just do the rest and stick it into our demo gametype to be ran through gemrb (if an env var is set). We already did that when we needed some text related test torture and a graphical loading test elsewhere. Or just leave it as something for the user to run manually through our console.
There's also our twisted connector contrib/manhole.py
, which looks like it needs just two small tweaks to be ready for this job.
Making the modules generally importable just to be able to see its method list is not justified. The modules are useless without the engine, so the extra complexity just makes things worse. For something that is run around once per year.
All the six or so proposed solutions are still dodgy though. And not even in the "perfect is the enemy of good" sense. A simple script does the job and does it well, also being almost independent of the code it works on, not something that will work just from the merge point on.
It appears I made another mistake — it's not just your time being wasted. Too late now.
I see three ways to keep extraction much simpler …
This is the kind of opinion I can use. It is impossible to know at the outset which direction will turn out to be the most fruitful, so an advice from a person familiar with the architecture is a boon.
You already got the introspection bits, just do the rest and stick it into our demo gametype to be ran through gemrb (if an env var is set).
Sounds hackish. But the idea is sound. I wonder if we can create a really small initial set-up, just enough to run the engine, and have it around for cases like this.
There's also our twisted connector
contrib/manhole.py
, which looks like it needs just two small tweaks to be ready for this job.
I shall look into it. Judging by the name, it is exactly what is needed.
All the six or so proposed solutions are still dodgy though. And not even in the "perfect is the enemy of good" sense. A simple script does the job and does it well, also being almost independent of the code it works on, not something that will work just from the merge point on.
It is extremely dependent on the code it works on. As I argued above, regular expressions are not suitable for dealing with any but the simplest kind of language. A smallest syntactic alteration will send a sed/awk solution down in flames. It seems impossible to argue in favour of the solution that is currently in place.
But I can guess from your tentative remarks that you are indeed grumpy. I can see now that your approach to development is extremely conservative, and, had I known it sooner, I may not have offered my services in the first place. You should be careful not to spill your discontent onto other open source participants though. If you imply that I am wasting someone else's time, then it is a grave accusation that I have done nothing to deserve, and it is not acceptable. As a maintainer, you can accept or decline any proposed change at your discretion. But you do not get to belittle others.
Like I said initially, we don't need a perfect dokuwiki2markdown converter. We're not parsing a language, but a few tags. Even pandoc doesn't provide a full translator. We use a very small subset of the language and in case any conversion was missed also in testing, causing misrendering (which is not a given, since they share syntax), it would be easy to fix and redeploy as it was found. I don't understand the argument about fragility either. Sure, in general, but this works already right now and we have control over the input.
Also, it's a temporary solution anyway, since we should just convert the source docstrings eventually. Now is just not a good time due to the refactoring going on. And when the time comes, it will be done very easily in the editor of choice of the person doing it.
Like I said initially, we don't need a perfect dokuwiki2markdown converter. We're not parsing a language, but a few tags. Even pandoc doesn't provide a full translator.
As I understood from our previous conversation, you have no objections to Pandoc in particular as a solution to the conversion problem. As I said previously, I tried it out and it seems to work without a flaw, so our real problem is not to convert, but to extract.
We use a very small subset of the language and in case any conversion was missed also in testing, causing misrendering (which is not a given, since they share syntax), it would be easy to fix and redeploy as it was found.
If we botch the extraction, however, it would not be easy to notice. Currently, a tiny change to the source code can lead to a method's documentation going missing.
As you can see, I am willing to go to some length to make sure the solution is maintainable and future proof. With your permission, and as per your advice, I shall go and research the ways to get that interpreter running with the smallest possible expense, with an eye to a draft pull request. The solution that involves firing up a Python interpreter and querying it can guarantee in a straightforward fashion that all built-in methods available to the user have a corresponding page on the web site, so I too have a liking to it. I suppose in a few days a proposal can be ready.
It's overkill, but sure. My other worry was that the version I tried didn't support dokuwiki, however I've upgraded the box since, so hopefully that's changed.
Why would we botch the extraction? You'd have to try deliberately, since the code also still needs to compile. But even if we did, it'd be as easy to notice as bad transformations — can't be done unless looked at. Well, except for the case where really nothing is output, which is trivial to detect if desired.
Let's see what you come up with and if the jumping through hoops is suddenly justified.
I had some health problems, so I had to delay this work. I am back on track now, so please wait a few more days.
@lynxlynxlynx See https://github.com/gemrb/gemrb/pull/698. Please review and tell me whether this aligns with what you had in mind.
What happened with the twisted approach?
I decided that it is not well motivated. A strong down side is that it implies launching an actual game and issuing commands via console. So, for example, it would be difficult to automate. Overall, it seems unnecessarily complex.
It would require interactivity only if badly designed, but none of your approaches are trivially automated any way, unlike the original, so that's a moot point. But I agree, unnecessarily complex.
To illustrate, this is what I did in the end, since the bug became a blocker for launch: https://github.com/gemrb/gemrb/commit/06ddb42e09c4d6ca9447ae8cdbf83bb618d84af2
The results are already up: https://gemrb.github.io/GUIScript/functions/ApplyEffect.html
If you want to make your solution perfect, you're welcome to, but it would remain a (portability) exercise. Up to you, we don't need to close the PR immediately.
@lynxlynxlynx I see, you waited since November, but you could not wait a few days more for me to get the build to work on Windows, and I suppose you did not have a Linux installation at hand. Disappointing.
API docs are inline in the code and we have a script extract them to then put on he web. We do this periodically, usually before a release and that's fine. However the docs are in Dokuwiki format and will need to be converted to Markdown.
10 will find the general solution and here we need to integrate it into the regexy generator script:
https://github.com/gemrb/gemrb/blob/master/admin/guidoc_wikifier.sh