Closed KingAkeem closed 5 years ago
The title is being saved property, I'm grabbing the text. https://github.com/DedSecInside/TorBot/blob/d18083f36be0f18e5255352f98d0f1c35ffb5ab7/modules/collect_data.py#L59
This is ready to be re-reviewed
- We just need to change the content part. Every website contains a
<meta content="some description" name="description">
tag. We just need that information. Or if this is empty we could just grab the contents inside<body>
tag. This way all the noise is removed.
This is still not fixed?
That was done in this commit https://github.com/DedSecInside/TorBot/pull/162/commits/652007392752d48c29b38901662e06a08c07df3c
Ready for review?
Yep yep
Issue #161
Changes Proposed
thehiddenwiki.org
, use--gather
to perform operation.Explanation of Changes
Save entries to csv file using the subjects of
ID | TITLE | META TAGS | CONTENT
<meta>