egbertbouman / youtube-comment-downloader

Simple script for downloading Youtube comments without using the Youtube API
MIT License
884 stars 223 forks source link

recomended program for viewing & editing comments #63

Closed terrypa closed 2 years ago

terrypa commented 3 years ago

Hi, newbe question, do you recomend some txt editor (windows) for viewing this json file (in notepad++) looks like mess

btw. ty for scritp its V. usefull

sinepgnol commented 3 years ago

I join the request. Something to view the comments in a well-arranged way would come in handy.

sinepgnol commented 3 years ago

I wrote a small little script for this: https://github.com/sinepgnol/YTB-comments-from-JSON-to-a-well-arranged-way

minamotorin commented 3 years ago

I wrote same script with jq for CLI (WTFPL):

cat "comment.json"                                      |
sed '$!s/$/,/; 1s/^/[/; $s/$/]/'                        |
jq -r '
  map( .cid |= sub("\\..*";""))                         |
  group_by(.cid)                                        |
  map(
    [
      (.[0:1]                                           |
        map(
          .author |= sub("^"; "---\n")                  |
          .text   |= gsub("\n"; "\n  ")                 |
          .text   |= sub("^"; "\n  ")
        )
      )[]
    ,
      (.[1:]                                            |
        map(
          .author |= sub("^"; "    ")                   |
          .text   |= gsub("\n"; "\n      ")             |
          .text   |= sub("^"; "\n      ")
        )
      )[]
    ]
  )
  [][]                                                  |
  .author + " (" + .votes + " : " + .time + "):" + .text
'                                                       |
sed '$G; $s/$/---/'

I missed Windows. cat and sed (and also jq) will not be installed in Windows by default.

Cyrix126 commented 3 years ago

I wrote same script with jq for CLI:

cat "comment.json"                                      |
sed '$!s/$/,/; 1s/^/[/; $s/$/]/'                        |
jq -r '
  map( .cid |= sub("\\..*";""))                         |
  group_by(.cid)                                        |
  map(
    [
      (.[0:1]                                           |
        map(
          .author |= sub("^"; "---\n")                  |
          .text   |= gsub("\n"; "\n  ")                 |
          .text   |= sub("^"; "\n  ")
        )
      )[]
    ,
      (.[1:]                                            |
        map(
          .author |= sub("^"; "    ")                   |
          .text   |= gsub("\n"; "\n      ")             |
          .text   |= sub("^"; "\n      ")
        )
      )[]
    ]
  )
  [][]                                                  |
  .author + " (" + .votes + " : " + .time + "):" + .text
'                                                       |
sed '$G; $s/$/---/'

I missed Windows. cat and sed (and also jq) will not be installed in Windows by default.

for some reasons your script would print 15 times each comments. Adding | sed 's/^ *//g' | awk '{$1=$1}1' | awk ' !x[$0]++' | sed 's/\(.*):\)/\n\1/g' solved the issue.

minamotorin commented 3 years ago

Thank you for replying! No problem with my environment (jq-1.6, BSD sed and BSD cat). I think your original json file is 15 loops of original comments. Run grep "some cid" /pass/to/comment.json. If it is true, adding sort -u in first of my script and solve the issue.

I'm interested in your addtional scripts. Could you tell me how | sed 's/^ *//g' | awk '{$1=$1}1' | awk ' !x[$0]++' | sed 's/\(.*):\)/\n\1/g' works?

Cyrix126 commented 3 years ago

you are right, the original comment.json was containing multiple times the same comment. I wonder why.

first sed is for deleting space at start of line, because same comments was having different number of spaces at the beginning of the line, then awk can show duplicates lines only one time. Then the last sed is adding a new line before the line with the name of the author to separate two comments. If I used only sort -u at the end, it would messed up everything because the name of the author would not be associated with the right comment (because every lines would be sorted). But I wasn't aware that the problem originated from the json file so adding it at the beginning is better, but it could be an issue because the comments would not be sorted by time.

minamotorin commented 3 years ago

I learnd awk techniques. Thanks!

That was bug but fixed. See #68.

The space at start of line makes distiniction of comment types. Normal comments has 2 spaces and that's replies has 6 spaces. This script does not sort by time because it will break relation between normal comments and replies.

But you are right, sort -u will break the relation. Use sed -n '1h; 1!G; /^\(.*\)\n\1/q; P' instead of sort -u and solve this issue.

minamotorin commented 3 years ago

I think sort -u does not break the relation because of rule of cid but using sed -n '1h; 1!G; /^\(.*\)\n\1/q; P' is good idea because sort -u is slow. Or your json file does not contain any replies because of bug.

sinepgnol commented 3 years ago

Wow, don’t tell me! TextEdit also works.

On 1 Mar 2021, at 19:07, archmord notifications@github.com wrote:

You can view json file in Firefox browser

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/egbertbouman/youtube-comment-downloader/issues/63#issuecomment-788155416, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKZW6WXF2PUVGSUJUZCPOQDTBPJXRANCNFSM4WEY5MBA.

elibroftw commented 3 years ago

@terrypa Use VSCode. It can also auto format json files.

raelb commented 2 years ago

Perhaps author can add a command line option to pretty print output. e.g. --pretty 1

egbertbouman commented 2 years ago

The newest version has a --pretty option, which will change the output format to indented JSON.