Closed RobSchoenaker closed 4 years ago
I think there is currently a lack of parsing rule for images (File:
namespace). I'll try adding it and hopefully make it done before end of week.
Would it be an idea to have an option for including the specific namespaces? These are language-dependant.
Well, that's a good point! I think I will go on with a new configuration for you to specify such namespace names.
Btw, there is a related discussion on earwig/mwparserfromhell#136 .
Read the discussion. Same issue indeed. I think it would make sense to have a static language class for these situations. I can provide the Dutch version based on my findings on all WikiPedia articles.
The updated ETA is before end of next week 😂
Published v0.3.0-int.3
.
See the following snippet for an example on how to customize namespace prefixes used as File:
namespace with WikitextParserOptions
. The presets are ["File", "Image"]
.
https://github.com/CXuesong/MwParserFromScratch/blob/f0dac824c8d91f58ffa18425262d153f323b36bd/UnitTestProject1/BasicParsingTests.cs#L158-L162
Additionally, you may use CanonicalName
, CustomName
, and Aliases
provided in WikiClientLibrary.Sites.NamespaceInfo
to retrieve the valid live namespace names on a MW site, if you are using WikiClientLibrary.
using WikiClientLibrary;
using WikiClientLibrary.Client;
using WikiClientLibrary.Sites;
var client = new WikiClient();
var endpointUrl = await WikiSite.SearchApiEndpointAsync(client, "nl.wikipedia.org")
var site = new WikiSite(client, endpointUrl);
await site.Initialization;
site.Namespaces[BuiltInNamespaces.File]
This is perfect. I will complete this for the Dutch (NL) WikiPedia as I find the namespaces. Will take some time though :)
Example:
[[Bestand:Bundesarchiv Bild 146III-373, Modell der Neugestaltung Berlins ("Germania").jpg|miniatuur|260px|right| Schaalmodel van de [[Welthauptstadt Germania]], 1939]]
This is a link on this particilar page: https://nl.wikipedia.org/wiki/Albert_Speer
With the code
var ast = LoadAndParse(fileName.Trim(' ', '\t', '"')); var text = ast.ToPlainText(NodePlainTextOptions.RemoveRefTags);
I would expect the text to read: Schaalmodel van de Welthauptstadt Germania, 1939
I have been trying to get this sorted, but I am kind of lost in the code...