dmulyalin / ttp

Template Text Parser
MIT License
350 stars 34 forks source link

parse a url #68

Closed gemerden closed 2 years ago

gemerden commented 2 years ago

This might not be the place (if so, send me elsewhere), but i can't for the life of me figure out how to parse a url with ttp:

from ttp import ttp

data = "https://stackoverflow.com/questions/63499479/extract-value-from-text-string-using-format-string-in-python"
template = "{{proto}}://{{host}}/{{path}}"

parser = ttp(data, template)
parser.parse()
res = parser.result(structure="flat_list")
print(res)

prints

[{'proto': 'https', 'host': 'stackoverflow.com/questions/63499479', 'path': 'extract-value-from-text-string-using-format-string-in-python'}]

while i hoped for:

[{'proto': 'https', 'host': 'stackoverflow.com', 'path': 'questions/63499479/extract-value-from-text-string-using-format-string-in-python'}]

Is there a simple way to make ttp do what i want?

pdimop commented 2 years ago

try: {{proto}}://{{host | re("[^/]*")}}/{{path}}

dmulyalin commented 2 years ago

@gemerden For parsing URL string would suggest using Python built-in urllib.parse.urlparse