Mincka / DMArchiver

A tool to archive the direct messages, images and videos from your private conversations on Twitter
GNU General Public License v3.0
222 stars 25 forks source link

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) #7

Closed Mincka closed 7 years ago

Mincka commented 7 years ago

Edited by Mincka on August 10th 2017:

The error message was due in this case to invalid json data. It seemed to be related to a connection issue and it was not possible to reproduce it. Other causes can be found here: https://stackoverflow.com/a/18460958

Original post: New ticket created from https://github.com/Mincka/DMArchiver/issues/1#issuecomment-259240926

$ /Users/xxx/Downloads/dmarchiver -id "YYY" -di -dg

Enter your username or email: zzz

Enter your password (characters will not be displayed):

Authentication succeedeed.

Conversation ID specified (YYY). Retrieving only one thread.

Starting crawl of 'YYY'

Failed to execute script cmdline

Traceback (most recent call last):

File "dmarchiver/cmdline.py", line 70, in

File "dmarchiver/cmdline.py", line 62, in main

File "dmarchiver/core.py", line 468, in crawl

File "requests/models.py", line 826, in json

File "json/init.py", line 319, in loads

File "json/decoder.py", line 339, in decode

File "json/decoder.py", line 357, in raw_decode

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Mincka commented 7 years ago

Ronnie,

I was not able to reproduce this issue yet. I believed the thread "629006352329760768" was the one with ​127,555 messages but it seems it was another thread. Does the thread "629006352329760768" was completely downloaded the first time with its 49,899 messages (count from your previous post here)?

Could you try again with only the "-di" switch? Like this: /Downloads/dmarchiver -id "629006352329760768" -di

sussron commented 7 years ago

The original thread with the 629006352329760768 with the ALL threads pull came up with 49,899 messages dating back to May 2016. when i ran it on it's own with that command line you sent me this morning, it pulled the 127,555 messages dating back to August 2015 when the conversation started. So it's perfect. And the images and such were downloaded into all the separate folders as you explained. So there are two specific message thread conversations that I wanted to back up individually that were not captured for some reason when i ran the full archive script, so I inserted the conversationID into the command you gave me and it started compiling messages, got up to about 3000 and then produced the error above. 629006352329760768 and conversation?id=655882638012518400

Thanks Ronnie

sussron commented 7 years ago

i went in to the developer resources and saw two alert icons and did screen shots of them in case this helps.

screenshot 2016-11-08 15 49 23 screenshot 2016-11-08 15 49 37
sussron commented 7 years ago

Reminder I'm tech illiterate about this coding, so please excuse my inexperience.

Ronnie

Mincka commented 7 years ago

Ok, so let me sum up to check if I'm correct. 1) First run, no thread specified : the tool has stopped after 49,899 messages on thread 629006352329760768, the thread was incomplete. The thread 655882638012518400 was not downloaded at all. 2) Second run with, thread 629006352329760768 specified : the tool was able to retrieve the complete thread with 127,555 messages. 3) Third run, thread 655882638012518400 specified : error above after around 3000 tweets.

I think I may have an idea why all the threads have not been downloaded the first time. I've counted all the threads in your previous message and found exactly 50 conversations. Currently, to find "all" the conversation IDs, the script loads the conversations available on the "first" "Messages" page but do not simulate scrolling to load more. I though that all the conversations were listed directly.

My guess is that when you scroll down through all the conversations, at the bottom, Twitter loads the next 50 conversations. I did not identify this case because I have a lot less than 50 conversations on Twitter! But it's an interesting case and I'm going to open a new ticket to improve the "all threads" mode which is in fact a "latest 50 conversations" mode it seems. Kudos for finding this bug!

For the error in the thread 655882638012518400, it seems to be parsing problem in the thread itself. Have you already tried to run the command again?

I've added a "debug" mode since to find out the tweet with issues but it's not in the current version. However, I think it will be the only possibility to find the "special" tweet which make the tool fail. I will keep you posted once I make the updated version available. It's complicated for me to create Mac versions because I do not own a Mac.

sussron commented 7 years ago

Yes that is exactly correct. I will see if I can scroll through my DMs and see how many conversation threads I do in fact have. I will also try to run the command again on the one with error.

you are incredible Julien, thanks so much!!

Ronnie

sussron commented 7 years ago

Going to paste the list as best i can pulling from my DM list.
Obviously i don't need all these conversations. This is for purposes of seeing if the number of conversations affects the program. Thanks!!

TeamErin

Zoey the Bulldog + 16

Nasty Women

Lucy @BulldawgLucy

Gus @stephbump

Atlanta Trip

Matilda @BulldogMatilda

Monroe da Handsum @MonroeBulldog

EmmyRose (G💗's sis) @gloveritchey

Ham, Earl of Sammich @HamBulldog

amc9000 @amc5848

Bubbles the Bulldog @ErinVanRyn

sam dellinger @ArtisticBulldog

Sammy @sammythebulldog

Stella Davis @PpawDavis

Princess Meyda @MeydaTheBoston

Allyson Morris @AllysonMorris

100%ChanceOfSexy @MamaTookOne

Lulu & Zeus + 11

London Bulldogs @thebumblers

Rugrat @nottaslimjim

Bruce @danilordbraga

Bunty Watson @thepugsmummy

LuLu & Joey @luvpug25

❤️ RIP Sweet Glover @warner_bear

Margot 🐶💖 @MargotTheBully

SammydaBulldog @SammydaBulldog

Barbara Tiesi @BTiesi

PudgeMan 4 Georgia❤️ @Pudgeman901

Lynyrd @lynyrdsbackyard

realSteveSchindler @ElectroGeezer

Melanie /Ex-#GOP @Lonestarmomcom

Bulldozer Bawss @BulldozerBawss

Something To Chew On @something2chew

Lord Nelson & Jones @LordNelson2007

Ozzy @OzzySharpe

Les @lpmoorerocks

deuce and tooie. @deuceandtooie

deuce @Deuce4Mayor

Mandy @MandyMcWilli

Beckett the Bulldog @LadyBeckett

Liz Bee Gee @LizBeeGee

Mr.Chips @dugan_fahey

Humphrey the Pug 🦃, Momz_and_Mooch

Momz_and_Mooch @adaalling816

Henry Ford @HenryFordBD

Sadie miss Denali @3phibotticelli

Fenway ❤️ Georgia @Lil_Fen

Cleopatra Eng Bully @Catsemail

SIDECAR SAVANNAH® @SavannahSidecar

Dawn Keller @Petunia4Disney

Bogart @LizS76

Zoey the Bulldog @ZoeytheBulldog

Storm #love4georgia, Mandy

RockyTheDogg Mommsie @RockyTheDogg

Whopper'sMom @besamemlr

Sugar @RileyDevilDog

Jiro @Tiny__boy

Jethro The Bulldog @BulldogJethro

Churchill, Otis

Cooper Fartacus @stinkyshmoo

Storm #love4georgia @ebucher1204

Benson'sMomma♡ @itsbensontime

Otis @otisbulldog

MAGOO @MagooCrew

Jodi Pepper @jpep530

Cara Hergenroether @clhergenroether

Gracie Sunshine @GracieSunshine1

Guido RIP Georgia ❤️ @GuidoLock

Lucy Belle @PrincessLucyB

~ LORRAINE ~ @ismaelricardo22

LudKee @lu_dkee

Jane Bloom @oceanjane13

Otis, Cooper Fartacus

Cara Hergenroether, Erin Lundmark

LOUIS.J @LOUISJPUG

Nancy Thornton

Churchill, Gracie Sunshine

Jane Bloom + 13

D @dougbulldog

Jeff Musk @JeffMusk

AUNTBEA53 @AuntBEA53

Toby Bulldog @TobyBully

Melanie /Ex-#GOP + 12

Melanie /Ex-#GOP + 3

Naughty Gossip @NaughtyNiceRob

Melanie /Ex-#GOP, EmmyRose (G💗's sis)

Brett Smiley @brettsmiley

Melanie /Ex-#GOP + 4

Jon McC @fatdaddybulldog

The Mighty Thor @MightyThor617

Sophie @SophieBully

Marley @BeautyBull

Delilah & Zoey @delilahgrace8

Coco + 9

Chaucer @BulldogChaucer

Zeus @zeusGreywind

Miss Dixie Dumplin' @FloridaPhyl

Melanie /Ex-#GOP and You

Melanie /Ex-#GOP and You

Griffen The Bulldog @griff_bulldog

Cooper Fartacus, Gracie Sunshine

Melanie /Ex-#GOP and You

Jane Bloom, Bogart

Gracie Sunshine and You

carrie @trooperMoo

Melanie /Ex-#GOP and You

Lord Roscoe @lord_roscoe

Melanie /Ex-#GOP and You

Melanie /Ex-#GOP, EmmyRose (G💗's sis)

Anna Willett @AnnaWillett

GEBR @gabullierescue

sam dellinger, MAGOO

Valerie Haines @ValerieHaines

Mandy + 4

sam dellinger + 12

Deanna Meadowcroft @DeannaMeady

Gracie Sunshine and You

Gunner @BullDogGunner

Melanie /Ex-#GOP, EmmyRose (G💗's sis)

Churchill @Churchiebulldog

Jenny @jdwalkerohmy

Melanie /Ex-#GOP, EmmyRose (G💗's sis)

Melanie /Ex-#GOP, EmmyRose (G💗's sis)

Boomer Sprinkle @BoomerSprinkle

Birthday Question??

FirstThereWasHarmony @FirstWasHarmony

Lady Maggie @Magsabully

Paloma J Undertooth™ @PalomaTheBoston

Lynsey Robinson @LoveBulldog

Ollie @olliegator57

Turkey Schmerky @casie190

Titus @jspring152

Stormy @EllaBulldogWI

My Cat Gang @archie_the

Rudozem Street Dogs @RSDR

Sway Bears & Jack + 9

Cleopatra Eng Bully, Storm #love4georgia

Dogs of Des Moines + 3

sam dellinger + 4

Scarlett ❤️ Georgia @PeachyKeen57

Pumpkin & Georgia @LadyPumpkinLove

Mia The Frenchie @ThePudgeMM

Tao of Dolly @TaoOfDolly

Pam Lang @cheersandcrowns

Jen @Bogelicious1

Goober @sincitybordeaux

Charlie and Renae @NaeNae_1204

Sara Keller @ILuv2nap

Snorf Industries @snorfindustries

Brave Bulldogs @CueFoils

~LORRAINE~ @serna_lorrie

Oswald'sPack @OswaldsPack

Sir Angus B. @SirPawbulous

Lola @poopachoo

MrBully, Mia & T.J. @BullyMiaTJ

Helen Yates @HelenYatesArt

mcgrubbs @mcgrubbs

🍀Emmett_dog 🍀 @Emmett_dog

Jackson the Bully @shan6872

Marlene The Frenchie @FrenchieMarlene

PEIA~AngelsGang~OTRB @peia_Bosterriah

Bella @BabyB_Bulldog

Lulu & Zeus @pam71579

Izzy & Louise @asamp213

Meesha The Maltese @sweetlilmeesha

Harry's Niece Purdey @PurdeyHollett

Edgar Eraspuss @EdgarEraspuss

Hooch @TicketyBooMedia

No Iowa Puppy Mills @IAFriends

sussron commented 7 years ago

oops that 170 separate conversation threads I have! Yikes!!

screenshot 2016-11-08 17 07 13
Mincka commented 7 years ago

😃 Indeed, that's a lot. You may have notice Twitter is loading additional conversations when you scroll in the "Messages" list on Twitter.

What would help me is to know the request that is done by Twitter but I don't want to bother you with that because it is a bit complicated. It would require to check the web requests done in the "Network" tab of the "Developer resources" you've open previously and check what is happening when you scroll in the "Messages" tab (just after you've clicked on the "Messages" button).

sussron commented 7 years ago

I really have nothing to hide do you want to just login as me and scroll and see? I'm totally okay with that. Really. Truly am.

On Nov 8, 2016 5:42 PM, "Julien Ehrhart" notifications@github.com wrote:

😃 Indeed, that's a lot. You may have notice Twitter is loading additional conversations when you scroll in the "Messages" list on Twitter.

What would help me is to know the request that is done by Twitter but I don't want to bother you with that because it is a bit complicated. It would require to check the web requests done in the "Network" tab of the "Developer resources" you've open previously and check what is happening when you scroll in the "Messages" tab (just after you've clicked on the "Messages" button).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Mincka/DMArchiver/issues/7#issuecomment-259282750, or mute the thread https://github.com/notifications/unsubscribe-auth/AVxOblIKxDApWOx9G8-FAdJsUwuNddkrks5q8PrDgaJpZM4Ks2lb .

sussron commented 7 years ago

Otherwise I can try what you suggest if you give me some instructions on how to see that info.

On Nov 8, 2016 5:42 PM, "Julien Ehrhart" notifications@github.com wrote:

😃 Indeed, that's a lot. You may have notice Twitter is loading additional conversations when you scroll in the "Messages" list on Twitter.

What would help me is to know the request that is done by Twitter but I don't want to bother you with that because it is a bit complicated. It would require to check the web requests done in the "Network" tab of the "Developer resources" you've open previously and check what is happening when you scroll in the "Messages" tab (just after you've clicked on the "Messages" button).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Mincka/DMArchiver/issues/7#issuecomment-259282750, or mute the thread https://github.com/notifications/unsubscribe-auth/AVxOblIKxDApWOx9G8-FAdJsUwuNddkrks5q8PrDgaJpZM4Ks2lb .

Mincka commented 7 years ago

I've sent you a follow request on Twitter, if you follow me back, we will be able to talk in private via Twitter.

Mincka commented 7 years ago

The initial error message json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) could not been reproduced on the same thread. Closing this issue for now.

sussron commented 7 years ago

Hey Julien, Not sure how to create a new message on your website, but this is the errors i got when loading the revised program today

Last login: Thu Feb 16 09:49:10 on ttys000

Ronnies-MacBook-Pro:~ ronniesussman$ cd downloads

Ronnies-MacBook-Pro:downloads ronniesussman$ ./dmarchiver -di

Enter your username or email: beckbulldognj

Enter your password (characters will not be displayed):

Error: Your username or password was invalid.

Exiting.

Failed to execute script cmdline

Ronnies-MacBook-Pro:downloads ronniesussman$ ./dmarchiver -di

Enter your username or email: beckybulldognj

Enter your password (characters will not be displayed):

Authentication succeedeed.

Conversation ID not specified. Retrieving all the threads.

Starting crawl of '629006352329760768'

Unexpected error for tweet '824997990851080200', raw HTML will be used for the tweet.

Unexpected error for tweet '816135843626831875', raw HTML will be used for the tweet.

Unexpected error for tweet '815389148622241795', raw HTML will be used for the tweet.

Unexpected error for tweet '812079166459904003', raw HTML will be used for the tweet.

Unexpected error for tweet '811391729601486851', raw HTML will be used for the tweet.

Unexpected error for tweet '809200106889093123', raw HTML will be used for the tweet.

Begin of thread reached0

Total processed tweets: 166850

Writing conversation to 629006352329760768.txt

Starting crawl of '1214469206-3229265936'

Failed to execute script cmdline

Traceback (most recent call last):

File "requests/packages/urllib3/connectionpool.py", line 595, in urlopen

File "requests/packages/urllib3/connectionpool.py", line 393, in _make_request

File "", line 2, in raise_from

File "requests/packages/urllib3/connectionpool.py", line 389, in _make_request

File "http/client.py", line 1197, in getresponse

File "http/client.py", line 297, in begin

File "http/client.py", line 258, in _read_status

File "socket.py", line 575, in readinto

File "ssl.py", line 929, in recv_into

File "ssl.py", line 791, in read

File "ssl.py", line 575, in read

ConnectionResetError: [Errno 54] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "requests/adapters.py", line 423, in send

File "requests/packages/urllib3/connectionpool.py", line 640, in urlopen

File "requests/packages/urllib3/util/retry.py", line 261, in increment

File "requests/packages/urllib3/packages/six.py", line 685, in reraise

File "requests/packages/urllib3/connectionpool.py", line 595, in urlopen

File "requests/packages/urllib3/connectionpool.py", line 393, in _make_request

File "", line 2, in raise_from

File "requests/packages/urllib3/connectionpool.py", line 389, in _make_request

File "http/client.py", line 1197, in getresponse

File "http/client.py", line 297, in begin

File "http/client.py", line 258, in _read_status

File "socket.py", line 575, in readinto

File "ssl.py", line 929, in recv_into

File "ssl.py", line 791, in read

File "ssl.py", line 575, in read

requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "dmarchiver/cmdline.py", line 70, in

File "dmarchiver/cmdline.py", line 67, in main

File "dmarchiver/core.py", line 466, in crawl

File "requests/sessions.py", line 488, in get

File "requests/sessions.py", line 475, in request

File "requests/sessions.py", line 596, in send

File "requests/adapters.py", line 473, in send

requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))

Ronnies-MacBook-Pro:downloads ronniesussman$

On Wed, Nov 9, 2016 at 5:14 PM, Julien Ehrhart notifications@github.com wrote:

The initial error message json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) could not been reproduced on the same thread. Closing this issue for now.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Mincka/DMArchiver/issues/7#issuecomment-259542074, or mute the thread https://github.com/notifications/unsubscribe-auth/AVxObrCnNNFK5UCE6uaLS5RppOFotNNLks5q8kWvgaJpZM4Ks2lb .