Closed 0x4d4c closed 1 year ago
您好,我已经收到您的周报,周报收取截止时间为每周二下午八点,八点后将截止收取,请按时发送您的周报,谢谢!
Hi, @0x4d4c!
I think I was able to fix the issue in a way that shouldn't disrupt normal extraction. I decided to add a new regex expression to the strip
parameter. You can see an example of my solution below. Since most URLs do not contain whitespace, this new code will extract anything that follows the pattern: whitespace + /\ + character
, so something like https://example.com/f
should still work.
If you run into any issues, feel free to let me know. I'll ping you when a new version is available from PyPi so you can test out this new addition.
Example:
import iocextract
def locate_url():
data = "command.exe https://pypi.org/project/iocextract/ /f /n /a \s ///xhh /no \\\\f /d \a"
return list(iocextract.extract_unencoded_urls(data, strip=True))
print(locate_url()) # => ['https://pypi.org/project/iocextract/']
I'll close this issue as soon as the new release is out.
You can download the new version from PyPi now.
New release: https://pypi.org/project/iocextract/1.13.2/
Wow, that was blazing fast! I tested the new release from PyPI and my sample files are processed correctly now. Thank you very much!
I'm parsing input containing examples of PowerShell or cmd.exe command lines. When a command flag with a slash comes after an URL, then the flag is included in the extracted URL.
Here is an example:
The trailing
/f
should not be included in the extracted URL.