Closed grossir closed 2 months ago
To test this you will need to copy paste
import hashlib
def sha1(s):
"""Return the sha1sum of a string.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! This algorithm is obsolete for most purposes. Its !
! usage is discouraged. Please use SHA256 instead. !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
:param s: The data to hash. Ideally bytes, but if unicode is passed in, it
will convert it to bytes first.
:return: a hexadecimal SHA1 hash of the data
"""
if isinstance(s, str):
s = s.encode()
sha1sum = hashlib.sha1()
sha1sum.update(s)
return sha1sum.hexdigest()
And then call it where appropiate in the sample_caller
# cleanup_content is called before the extraction task in CL
# so it is only useful for cleaning HTML files
data = site.cleanup_content(data)
logger.info(sha1(data))
Perhaps we should add this to the sample_caller...
Helps solve:
Implement cleanup_content, only for coloctapp: