binux / pyspider

A Powerful Spider(Web Crawler) System in Python.
http://docs.pyspider.org/
Apache License 2.0
16.49k stars 3.69k forks source link

Why results stored as binary data? #242

Open xesina opened 9 years ago

xesina commented 9 years ago

Why results stored as binary data?
With postgres tables, result field turns to bytea type and data stored as codepoints and need to be converted: [code] {"title": "HP 355 G2 - A \u0637\u0631\u0627\u062d\u06cc \u0648 \u0633\u0627\u062e\u062a \u067e\u0648\u0631\u062a\u200c\u0647\u0627 \u0648 \u0627\u062a\u0635\u0627\u0644\u0627\u062a [/code]

binux commented 9 years ago

It's not stored in binary, it's encoded JSON.

On Wed, Jul 15, 2015 at 8:52 PM sina saeedi notifications@github.com wrote:

Why results stored as binary data?

With postgres tables, result field turns to bytea type and data stored as codepoints and need to be converted: [code] {"title": "HP 355 G2 - A \u0637\u0631\u0627\u062d\u06cc \u0648 \u0633\u0627\u062e\u062a \u067e\u0648\u0631\u062a\u200c\u0647\u0627 \u0648 \u0627\u062a\u0635\u0627\u0644\u0627\u062a [/code]

— Reply to this email directly or view it on GitHub https://github.com/binux/pyspider/issues/242.

xesina commented 9 years ago

Ok but why you store it in binary column? is it possible to store in plain varchar column?

binux commented 9 years ago

Because encoded JSON is not "variable" char. Only ascii characters are exists in it.

On Wed, Jul 15, 2015 at 9:16 PM sina saeedi notifications@github.com wrote:

Ok but why you store it in binary column? is it possible to store in plain varchar column?

— Reply to this email directly or view it on GitHub https://github.com/binux/pyspider/issues/242#issuecomment-121611509.

ihipop commented 9 years ago

why don't you consider the text column instead of varchar or blob in mysql it's not very convenient to debug with blob

binux commented 9 years ago

@ihipop Yes, you can use text column. The column type is hidden behind the database interface, I hadn't considered much about it and had never debug directly with any mysql commands/tools. Besides, with encapsulating of the database operators, we can use msgpack, BSON or gzip to store the data to column.

xesina commented 9 years ago

@ihipop In postgres varchar is same as text Furthermore in some languages such as PHP you can't use some built in functions(json_decode) because it wont support this type of data when decodes string.

leession commented 7 years ago

I have the same problem, try this follow: select encode(result, 'escape')::json -> 'json_column' from project_table limit 1

arduanov commented 7 years ago

This works for postgresql

SELECT convert_from(result, 'UTF8')::json FROM project_table