Open harej opened 2 months ago
Yes, that's a valid URL, or at least it was nearly 20 years ago. https://web.archive.org/web/20080918055001/http://thomas.loc.gov/cgi-bin/bdquery/z?d109:SN01033:
(You may need to copy that URL with the colon into the address bar manually)
See the difference here:
import mwparserfromhell
wikitext1 = "http://thomas.loc.gov/cgi-bin/bdquery/z?d109:SN01033:"
wikitext2 = "[http://thomas.loc.gov/cgi-bin/bdquery/z?d109:SN01033: foo]"
parsed1 = mwparserfromhell.parse(wikitext1)
parsed2 = mwparserfromhell.parse(wikitext2)
print(parsed1.filter_external_links())
print(parsed2.filter_external_links())
Which gives
['http://thomas.loc.gov/cgi-bin/bdquery/z?d109:SN01033']
['[http://thomas.loc.gov/cgi-bin/bdquery/z?d109:SN01033: foo]']
Note that this is consistent with how MediaWiki behaves :shrug:
For your snippet, the thing is that mwparserfromhell does not expand templates so it can't know that the url
parameter is actually used inside square brackets.
Test case:
What I get:
['http://thomas.loc.gov/cgi-bin/bdquery/z?d109:SN01033']
What I should get:['http://thomas.loc.gov/cgi-bin/bdquery/z?d109:SN01033:']
with the colon at the end