JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.39k stars 5.46k forks source link

Support for PCRE DUPNAMES feature #51560

Open mgkurtz opened 11 months ago

mgkurtz commented 11 months ago

I wanted to match Markdown links of the form [text](link) or [link] using PCRE’s DUPNAMES option for duplicate group names. (See second half of the named groups PCRE pattern specification section.) In a simplified fashion my use case is

julia> link_pattern = r"(?J)\[(?<text>[^][]*)]\((?<link>[^()]*)\)|\[(?<link>[^][]*)]";

julia> m1, m2 = eachmatch(link_pattern, "[text1](link1) [link2]") |> collect
2-element Vector{RegexMatch}:
 RegexMatch("[text1](link1)", text="text1", link="link1", link=nothing)
 RegexMatch("[link2]", text=nothing, link=nothing, link="link2")

julia> m1["link"]
ERROR: no capture group named link found in regex
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] getindex
   @ ./regex.jl:241 [inlined]
 [3] getindex(m::RegexMatch, name::String)
   @ Base ./regex.jl:244
 [4] top-level scope
   @ REPL[48]:1

The same happens with m2["link"]. All other group accesses, like m1["text"] or m1[2] still work.

While there are other solutions for my original problem, I would still be glad, if the DUPNAMES feature were supported. At least, the error message could be more fitting. The relevant PCRE API reference section suggests, that m["link"] should be something(m[2], m[3], Some(nothing)) for a match mof link_pattern.

ViralBShah commented 11 months ago

Perhaps @StefanKarpinski may be the right person to ping here.