cube2222 / octosql

OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
Mozilla Public License 2.0
4.74k stars 201 forks source link

CSV output: bubble up error message so more clear #301

Open chapmanjacobd opened 1 year ago

chapmanjacobd commented 1 year ago
$ unzstd --memory=2048MB --stdout RS_2022-08.zst | octosql "SELECT * FROM stdin.json" -o csv > RS_2022-08.csv                                                panic: runtime error: invalid memory address or nil pointer dereference                                         [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x908f1b]                                          
goroutine 14 [running]:                                 github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x7, {}, {}, {}, {}, {}, {}, {}, {0x0}, {{0x0, ...}}, ...}, ...)                                                  /home/xk/octosql/datasources/json/execution.go:196 +0xc1b
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x8, {}, {}, {}, {}, {}, {}, {}, {0x0}, {{0xc000638000, ...}}, ...}, ...)
        /home/xk/octosql/datasources/json/execution.go:2
09 +0xa58                                               github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x7, {}, {}, {}, {}, {}, {}, {}, {0xc00062aa80}, {{0x0, ...}}, ...}, ...)                                         /home/xk/octosql/datasources/json/execution.go:196 +0xc78                                               github.com/cube2222/octosql/datasources/json.glob..func1.1()                                                            /home/xk/octosql/datasources/json/workers.go:68 +0x5b8                                                  created by github.com/cube2222/octosql/datasources/json.glob..func1                                                     /home/xk/octosql/datasources/json/workers.go:44 +0x45                                                   panic: runtime error: invalid memory address or nil poin
ter dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x
0 pc=0x908f1b]
cube2222 commented 1 year ago

Hey, thanks for the report.

First time I'm seeing this. Do you possibly have the JSON value that is causing this?

chapmanjacobd commented 1 year ago

It's the same 10GB file as before https://files.pushshift.io/reddit/submissions/RS_2022-08.zst.

Not sure how to get the JSON value. In the code you should be able to add some logging in the stacktrace to print it but maybe with stuff like memory errors that is more difficult?

when I try to generate a trace it only creates a 16 byte file:

unzstd --memory=2048MB --stdout ~/lb/psaw/files.pushshift.io/reddit/submissions/RS_2022-08.zst | octosql "SELECT * FROM stdin.json" -o csv --profile trace
2022/10/15 13:02:41 profile: trace enabled, trace.out
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x908f1b]

goroutine 7 [running]:
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x7, {}, {}, {}, {}, {}, {}, {}, {0x0}, {{0x0, ...}}, ...}, ...)
    /home/xk/octosql/datasources/json/execution.go:196 +0xc1b
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x8, {}, {}, {}, {}, {}, {}, {}, {0x0}, {{0xc00027aa00, ...}}, ...}, ...)
    /home/xk/octosql/datasources/json/execution.go:209 +0xa58
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x7, {}, {}, {}, {}, {}, {}, {}, {0xc0005f9b60}, {{0x0, ...}}, ...}, ...)
    /home/xk/octosql/datasources/json/execution.go:196 +0xc78
github.com/cube2222/octosql/datasources/json.glob..func1.1()
    /home/xk/octosql/datasources/json/workers.go:68 +0x5b8
created by github.com/cube2222/octosql/datasources/json.glob..func1
    /home/xk/octosql/datasources/json/workers.go:44 +0x45
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x908f1b]

goroutine 8 [running]:
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x7, {}, {}, {}, {}, {}, {}, {}, {0x0}, {{0x0, ...}}, ...}, ...)
    /home/xk/octosql/datasources/json/execution.go:196 +0xc1b
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x8, {}, {}, {}, {}, {}, {}, {}, {0x0}, {{0xc00027aa00, ...}}, ...}, ...)
    /home/xk/octosql/datasources/json/execution.go:209 +0xa58
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x7, {}, {}, {}, {}, {}, {}, {}, {0xc0005f9b60}, {{0x0, ...}}, ...}, ...)
    /home/xk/octosql/datasources/json/execution.go:196 +0xc78
github.com/cube2222/octosql/datasources/json.glob..func1.1()
    /home/xk/octosql/datasources/json/workers.go:68 +0x5b8
created by github.com/cube2222/octosql/datasources/json.glob..func1
    /home/xk/octosql/datasources/json/workers.go:44 +0x45

▓▒▓░ lb:/ (main|●2?1) [141/SIGPIPE|2]🍪  ls trace.out 
Permissions Size User Date Modified Git Name
.rw-r--r--@   16 xk   15 Oct 13:02   -N trace.out
████ lb:/ (main|●2?1) 🌍 cat trace.out
go 1.11 trace⏎                    
chapmanjacobd commented 1 year ago

Interesting... when I only do the first line then I get a more clear error message:

░▒░█ lb:/ (main|●2?1) 🍞 unzstd --memory=2048MB --stdout ~/lb/psaw/files.pushshift.io/reddit/submissions/RS_2022-08.zst | head -1 | octosql "SELECT * FROM stdin.json" -o csv --profile trace
2022/10/15 13:04:57 profile: trace enabled, trace.out
panic: invalid value type to print in CSV: List

goroutine 1 [running]:
github.com/cube2222/octosql/outputs/formats.FormatCSVValue(0xc0003ac480, {0x7, 0x0, 0x0, 0x0, {0x0, 0x0}, {0x0, 0x0, 0x0}, ...})
    /home/xk/octosql/outputs/formats/csv_format.go:65 +0x785
github.com/cube2222/octosql/outputs/formats.(*CSVFormatter).Write(0xc00008b6a0, {0xc0009cc000, 0x5e, 0x0?})
    /home/xk/octosql/outputs/formats/csv_format.go:42 +0x12c
github.com/cube2222/octosql/outputs/eager.(*OutputPrinter).Run.func1({{0xcdd960?, 0x0?}}, {{0xc0009cc000, 0x5e, 0x5e}, 0x0, {0x0, 0x0, 0x0}})
    /home/xk/octosql/outputs/eager/eager.go:42 +0x54
github.com/cube2222/octosql/datasources/json.(*DatasourceExecuting).Run(0xc000497f50, {{0xf63870?, 0xc0003bd3b0?}, 0x0?}, 0xc0001abc50, 0x0?)
    /home/xk/octosql/datasources/json/execution.go:127 +0xc86
github.com/cube2222/octosql/outputs/eager.(*OutputPrinter).Run(0xc0002091c0, {{0xf63870?, 0xc0003bd3b0?}, 0x0?})
    /home/xk/octosql/outputs/eager/eager.go:39 +0x1e4
github.com/cube2222/octosql/cmd.glob..func4(0x157bd60, {0xc00009eb90, 0x1, 0x5?})
    /home/xk/octosql/cmd/root.go:458 +0x3b54
github.com/spf13/cobra.(*Command).execute(0x157bd60, {0xc000032250, 0x5, 0x5})
    /home/xk/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:856 +0x67c
github.com/spf13/cobra.(*Command).ExecuteC(0x157bd60)
    /home/xk/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:974 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
    /home/xk/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:902
github.com/spf13/cobra.(*Command).ExecuteContext(...)
    /home/xk/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:895
github.com/cube2222/octosql/cmd.Execute({0xf637c8?, 0xc000208dc0?})
    /home/xk/octosql/cmd/root.go:471 +0x53
main.main()
    /home/xk/octosql/main.go:24 +0xe8
█░█▒ lb:/ (main|●2?1) [141/SIGPIPE|0|2]🌎 unzstd --memory=2048MB --stdout ~/lb/psaw/files.pushshift.io/reddit/submissions/RS_2022-08.zst | head -1 
{"all_awardings":[],"allow_live_comments":false,"archived":false,"author":"Kresley","author_created_utc":1390870929,"author_flair_background_color":null,"author_flair_css_class":null,"author_flair_richtext":[],"author_flair_template_id":null,"author_flair_text":null,"author_flair_text_color":null,"author_flair_type":"text","author_fullname":"t2_f0c4k","author_patreon_flair":false,"author_premium":false,"awarders":[],"banned_by":null,"can_gild":true,"can_mod_post":false,"category":null,"content_categories":null,"contest_mode":false,"created_utc":1659312011,"discussion_type":null,"distinguished":"moderator","domain":"self.podcasts","edited":false,"event_end":1659326400.0,"event_is_live":false,"event_start":1659312000.0,"gilded":0,"gildings":{},"hidden":false,"hide_score":false,"id":"wcciq1","is_created_from_ads_ui":false,"is_crosspostable":true,"is_meta":false,"is_original_content":false,"is_reddit_media_domain":false,"is_robot_indexable":true,"is_self":true,"is_video":false,"link_flair_background_color":"#7e53c1","link_flair_css_class":"announce","link_flair_richtext":[],"link_flair_template_id":"efc6fb7e-75ae-11ea-baed-0e1ba7588035","link_flair_text":"Announcement","link_flair_text_color":"light","link_flair_type":"text","locked":false,"media":null,"media_embed":{},"media_only":false,"name":"t3_wcciq1","no_follow":false,"num_comments":8,"num_crossposts":1,"over_18":false,"parent_whitelist_status":"all_ads","permalink":"\/r\/podcasts\/comments\/wcciq1\/have_questions_about_recent_supreme_court\/","pinned":false,"post_hint":"self","preview":{"enabled":false,"images":[{"id":"Qalc2eReon-KD6FX7LpDmEuzwjap9_yobQbqDYd2RoQ","resolutions":[{"height":108,"url":"https:\/\/external-preview.redd.it\/aY2_qQw82pmeuXTAHhzUC3BvZrwYhi9GKxP6Gw9ECQU.jpg?width=108&crop=smart&auto=webp&s=01924a24a46b6e764caf58f310678ac69a6c6c8b","width":108},{"height":216,"url":"https:\/\/external-preview.redd.it\/aY2_qQw82pmeuXTAHhzUC3BvZrwYhi9GKxP6Gw9ECQU.jpg?width=216&crop=smart&auto=webp&s=ae46ea72110dc7c060d102b496cfd457ecd6b950","width":216},{"height":320,"url":"https:\/\/external-preview.redd.it\/aY2_qQw82pmeuXTAHhzUC3BvZrwYhi9GKxP6Gw9ECQU.jpg?width=320&crop=smart&auto=webp&s=f6b853bbf5f384084a45c8dde00d5d7b97356d69","width":320}],"source":{"height":500,"url":"https:\/\/external-preview.redd.it\/aY2_qQw82pmeuXTAHhzUC3BvZrwYhi9GKxP6Gw9ECQU.jpg?auto=webp&s=dc05866f0c0b48980e43549ceb0955176f30e3c6","width":500},"variants":{}}]},"pwls":6,"quarantine":false,"removed_by":null,"removed_by_category":null,"retrieved_on":1660025808,"score":16,"secure_media":null,"secure_media_embed":{},"selftext":"We're lucky to have an attorney who is *also* a veteran podcaster kind enough to join here for a live Reddit Talk about recent US Supreme Court decisions, Justices and perhaps some other federal law decisions you may (or may not have) heard about in the news lately.  He may also have some friends and guests join. \n\n\nCharles Star is an attorney in Brooklyn, the host of [Hostile Witness](https:\/\/www.soundcloud.com\/hwpod), a panelist on [ALAB Series](https:\/\/www.alabseries.com\/) and the guy who did most of the talking on the late [Mic Dicta](https:\/\/soundcloud.com\/micdicta). He tweets at [@ugarles](https:\/\/twitter.com\/Ugarles) (and has a reddit account he almost never uses). He writes about legal issues from time to time and has bylines in Vice, Slate and a bunch of places that no longer exist. (He'd probably appear on your podcast.)  \n\n\n**What is Reddit Talk?**\n\nTalk allows us to host interactive sessions with podcasters and users of the community can join in and interact with the speakers in real time from either new.reddit on desktop or though (an official) Reddit app. A Talk can handle up to 30 different speakers and up to 100k listeners in the audience can tune in at the same time.\n\nYou can learn more about Reddit Talk from the user guide [here](https:\/\/reddithelp.com\/hc\/en-us\/articles\/4404349195284-What-is-Reddit-Talk-) .\n\n**Ask Away!**\n\nIf you can't make it live at that time, you may leave any questions you might have  as a comment on this announcement post and we'll try to pick some to read during the Talk. A recording of the Talk should continue to be available here after it concludes.\n\nYou can tap on **Follow** on this post if you'd like to receive a reminder notification when the Talk will begin, if you'd like to come ask a question live.","send_replies":true,"spoiler":false,"stickied":false,"subreddit":"podcasts","subreddit_id":"t5_2qh1d","subreddit_name_prefixed":"r\/podcasts","subreddit_subscribers":2322456,"subreddit_type":"public","suggested_sort":null,"thumbnail":"self","thumbnail_height":null,"thumbnail_width":null,"title":"Have questions about recent Supreme Court decisions? So do we! Join us Sunday @ 8:00 p.m. EST with a lawyer\/podcaster who can help answer how we got here.","top_awarded_type":null,"total_awards_received":0,"treatment_tags":[],"upvote_ratio":0.71,"url":"https:\/\/www.reddit.com\/r\/podcasts\/comments\/wcciq1\/have_questions_about_recent_supreme_court\/","view_count":null,"whitelist_status":"all_ads","wls":6}
chapmanjacobd commented 1 year ago

That makes sense to me. Maybe instead of panic it should just be Error:

Error: typecheck error: invalid type for CSV: List

It works with no problems when specifying all columns except the list and object type columns. 38 million rows in 500 seconds.

unzstd --memory=2048MB --stdout ~/lb/psaw/files.pushshift.io/reddit/submissions/RS_2022-08.zst | octosql "SELECT author_flair_css_class,allow_live_co
mments,archived,author,author_patreon_flair,author_created_utc,author_flair_background_color,author_flair_template_id,author_flair_text,author_flair_text_color,author_flair_type,author_fullnam
e,author_premium,banned_by,can_gild,can_mod_post,category,content_categories,contest_mode,created_utc,discussion_type,distinguished,domain,edited,event_end,event_is_live,event_start,gilded,hid
den,hide_score,is_reddit_media_domain,id,is_created_from_ads_ui,is_crosspostable,is_meta,is_original_content,is_robot_indexable,is_self,is_video,link_flair_background_color,link_flair_css_clas
s,link_flair_template_id,link_flair_text,link_flair_text_color,link_flair_type,locked,media,media_only,name,no_follow,num_comments,num_crossposts,over_18,parent_whitelist_status,permalink,pinn
ed,post_hint,pwls,quarantine,removed_by,removed_by_category,retrieved_on,score,secure_media,send_replies,spoiler,stickied,subreddit,subreddit_id,subreddit_name_prefixed,subreddit_subscribers,s
ubreddit_type,suggested_sort,thumbnail_height,thumbnail,thumbnail_width,title,top_awarded_type,total_awards_received,upvote_ratio,url,view_count,whitelist_status,wls FROM stdin.json" -o csv > 
RS_2022-08.csv

Maybe the only thing to fix here is to bubble up the error somehow so that it does not show up as "invalid memory address or nil pointer dereference"

cube2222 commented 1 year ago

Thanks for the writeup!

I'll admit I have no idea why you got the nil dereference instead of the List panic there, but I'll look into it (and probably, as you suggested, turn it into a normal error, to avoid further similar issues).

Cheers!