Closed pawelmhm closed 4 years ago
@pawelmhm this looks like farmandfleet.com website, and from what I can see, I can parse its data just fine with json.loads
alone. For example, for the page https://www.farmandfleet.com/lawn-aerators-and-rollers/
:
>>> type(json.loads(response.css('script:contains(searchResult)').re_first('window.searchResult = (.*);')))
<class 'dict'>
Please provide url to the actual page and the way you've obtained this invalid input from it.
Yes farmandfleet. This is broken JavaScript: https://www.farmandfleet.com/lawn-mower-and-atv-attachments/ you can notice inches character interferes with quotes. "description": "Buy (1) Agri-Fab 44" Lawn Sweeper.
I'm not sure we can do anything here. Probably we can close ticket and assume it's just broken. But leaving decision to you. If you think we can parse it in chompjs let me know
As I said, I also need to see how did you extract the string from the website. Using scrapy I can parse it just fine with json.loads
if I pass replace_entities=False
to re_first
:
>>> script = response.css('script:contains(searchResult)')
>>> type(json.loads(script.re_first('window.searchResult = (.*);')))
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting ',' delimiter: line 1 column 27105 (char 27104)
>>> type(json.loads(script.re_first('window.searchResult = (.*);', replace_entities=False)))
<type 'dict'>
Closing due to inactivity.
Hey @Nykakin found another sample that is failing for unclear reasons, pasted this sample here:
https://pastebin.com/2tZEm5EL
It fails with: "ValueError: Parser error: ... Lawn Sweeper, get (1) Agri-Fab"
I see this is actually invalid JavaScript with quotes that are not escaped. Do you think we should support something like this?