aichaos / rivescript-go

A RiveScript interpreter for Go. RiveScript is a scripting language for chatterbots.
https://www.rivescript.com/
MIT License
60 stars 16 forks source link

can't really capture a user's e-mail #31

Open distravantari opened 7 years ago

distravantari commented 7 years ago

the "@" is working fine but it still remove the "." .

distravantari commented 7 years ago

I just trace it and I found that this line is actually the one who removes the "."

https://github.com/aichaos/rivescript-go/blob/b2781a813d641fc637d2f6271691f016d1c2633b/src/tags.go#L27

kirsle commented 7 years ago

Yeah I was aware of the Unicode punctuation getting in the way of that, but I just thought of a way I might work around it:

Don't remove punctuation if it has letters on either side of it. So a normal period in a sentence would be removed because one side has a space on it, but a string like "some.com" with a dot touching letters would be left alone.

I played with it a bit here but the regexp still isn't quite right: https://play.golang.org/p/PFujU2kE0Q

At first I was trying to use a lookbehind regexp like (?<![A-Za-z0-9])[.,:?!](?![A-Za-z0-9] but Go doesn't support those, so something that just captures the characters on either side and returns them ($1$2) would probably be the way to go.

meowgorithm commented 7 years ago

@distravantari for what it's worth, I’ve been needing to capture user input exactly as the user sent (emails, street addresses and so forth) and have been doing so by putting the raw input into a uservar and then pulling it out in a macro. See #34.

ghost commented 1 year ago

maybe instead of using *regexp as UnicodePunctuation having a function in place because Go dont support lookbehind regexp default will be still the current regexp wrapped in a function

i was running into similar problems with catching IPv4/IPv6 and Domains