When use newspaper to extract articles containing code, the content sequence is incorrect,
for example, http://akat1.pl/?id=2
The error is placed in the pass-through() function of mail.local:
<code>
After extraction, it becomes:
<code>
The error is placed in the pass() function of mail.local:
this bug is exist in convert_to_text() function of outputformatters.py:
def convert_to_text(self):
txts = []
for node in list(self.get_top_node()): # Bug!!!!
try:
txt = self.parser.getText(node)
If you use the following code to output txt, the order is correct ( it just doesn't wrap the line correctly), but if you use the for loop above, it will be out of order.
txt = self.parser.getText(self.get_top_node())
When use newspaper to extract articles containing code, the content sequence is incorrect, for example, http://akat1.pl/?id=2
After extraction, it becomes:
this bug is exist in convert_to_text() function of outputformatters.py:
If you use the following code to output txt, the order is correct ( it just doesn't wrap the line correctly), but if you use the for loop above, it will be out of order.
txt = self.parser.getText(self.get_top_node())