lonegreyhat commented 6 years ago

I get this every time I try to archive, whether it's just one conversation or all of them. Looked through all the known open/closed issues, didn't see anything like this. dmarchiver Twitter error code reference

Mincka commented 6 years ago

Oh, never saw this one. Interesting.

Can you run dmarchiver with -r option to get the raw HTML files? Just to be sure there is no other error messages elsewhere.

You can also try with the latest pre-release (0.2.2) but I'm not sure the result will different in your case.

lonegreyhat commented 6 years ago

Ha... but of course. Yesterday I couldn't get it to work with 3 different versions (0.2.2 included) no matter how many times I tried... and today I can't reproduce the error. I'm gonna blame Twitter. I'll close the issue for now but I'll be back if it happens again (with raw files and wireshark pcap). Thanks!

lonegreyhat commented 6 years ago

This script converts the raw output of dmarchiver into fully wrapped HTML for viewing in a browser, including link correction/images/cards. Just drop it in the same folder you have dmarchiver in.

It will automatically incrementally store the raw history as well, but since dmarchiver doesn't store raw history, you will have to do a full archive the first time around.

On Wed, Nov 22, 2017 at 1:23 PM, Julien Ehrhart notifications@github.com wrote:

Oh, never saw this one. Interesting.

Can you run dmarchiver with -r option to get the raw HTML files? Just to be sure there is no other error messages elsewhere.

You can also try with the latest pre-release (0.2.2) but I'm not sure the result will different in your case.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Mincka/DMArchiver/issues/44#issuecomment-346434609, or mute the thread https://github.com/notifications/unsubscribe-auth/AgT7YQPRdFvfMMyMXdaULZWahqtGFyvcks5s5GaOgaJpZM4QmkTV .

!"c:\program files (x86)\python35-32\python.exe"

This software is distributed under GPL v3

Copyright 2017 @lonegreyhat

import argparse import sys import queue import time import os from ctypes import * from colorama import init from os import walk currentfile = [] currentfilename = [] STD_OUTPUT_HANDLE = -11 SCR_UPDATE_INTERVAL = 50000 RAW_BKUP_DIR = "rawbkup" def createFolder(directory): try: if not os.path.exists(directory): os.makedirs(directory) except OSError: print ('Error: Creating directory. ' + directory)

def updatescr(filecount): global currentfile global currentfilename screenposition = 1 print("\033[" + str((screenposition -1) 4 + 2) + ";2HSource: " + currentfilename[0] + "\033[K") print("\033[" + str((screenposition -1) 4 + 3) + ";2HOutput: " + currentfilename[1] + "\033[K") progress(currentfile[4], currentfile[5], (screenposition -1) 4 + 4, "...writing history file") progress(currentfile[0], currentfile[1], (screenposition -1) 4 + 5, ".......parsing raw file") progress(currentfile[2], currentfile[3], (screenposition -1) 4 + 6, "......writing HTML file") print("\033[" + str((screenposition -1) 4 + 7) + ";2HFiles processed: " + str(filecount) + "\033[K")

def main():

os.system('cls')
global currentfile
global currentfilename
global debug
parser = argparse.ArgumentParser()
parser.add_argument("-n", "--night", action="store_true", help="Night/dark color scheme")
parser.add_argument("-i", "--infile", help="Raw input filename")
parser.add_argument("-o", "--outfile", help="HTML output filename [default infile + .html]")
parser.add_argument("-d", "--debug", action="store_true", help="Print debugging info")
parser.add_argument("-b", "--batch", action="store_true", help="Batch mode (bypass warnings)")
args = parser.parse_args()
if args.debug:
    debug = True

j = 0
mynumbers = [0, 1, 0, 1, 0, 1]
createFolder(RAW_BKUP_DIR)
if args.infile:
    if args.outfile:
        filenames = [args.infile, args.outfile]
    else:
        filenames = [args.infile, args.infile + ".html"]
    if args.debug:
        print(filenames)
    currentfilename = filenames
    parseHTML(filenames, args.night, 0)
    updatescr(1)
else:
    batch = args.batch
    if not batch:
        response = input("\033[1;2HDo you want to convert all raw files in this directory? [y/n]: ")
        if response != "y":
            return
        else:
            response = input("\033[1;2H\033[41mWARNING: this will overwrite any existing HTML files. Continue? [y/n]:\033[0m ")
            if response != "y":
                return
            else:
                batch = True
    if batch:
        for (dirpath, dirnames, filenames) in walk("./"):
            i = 0
            filecount = 0
            if args.debug:
                print(filenames)
            while i < len(filenames):
                if filenames[i].find("-raw") != -1:
                    if filenames[i].find(".html") == -1:
                        if args.outfile:
                            tempfilenames = [filenames[i], args.outfile + str(filecount) + ".html"]
                        else:
                            tempfilenames = [filenames[i], filenames[i] + ".html"]
                        currentfilename = tempfilenames
                        parseHTML(tempfilenames, args.night, filecount)
                        filecount += 1
                        updatescr(filecount)
                i += 1
            break
return

def parseHTML(filenames, night, filecount): infile = filenames[0] outfile = filenames[1] global currentfile imgs = [] files = [] imgDir = [] myData = [] myTempString = [] myTempString2 = [] myStack = [] myTempStack = [] myString = [] nomatch = True myString = """<!DOCTYPE html>

<html>
<head>"""
if night:
    myString += """<link rel="stylesheet" href="https://abs.twimg.com/a/1513386918/css/t1/nightmode_twitter_core.bundle.css" class="coreCSSBundles">
<link rel="stylesheet" class="moreCSSBundles" href="https://abs.twimg.com/a/1513386918/css/t1/nightmode_twitter_more_1.bundle.css">
<link rel="stylesheet" class="moreCSSBundles" href="https://abs.twimg.com/a/1513386918/css/t1/nightmode_twitter_more_2.bundle.css">\n</head><body><div style="width:800px; margin:auto; padding:20px; background-color:#1B2836;">"""
else:
    myString+= """<link rel="stylesheet" href="https://abs.twimg.com/a/1513386918/css/t1/twitter_core.bundle.css" class="coreCSSBundles">
<link rel="stylesheet" href="https://abs.twimg.com/a/1513386918/css/t1/twitter_more_1.bundle.css" class="moreCSSBundles">
<link rel="stylesheet" href="https://abs.twimg.com/a/1513386918/css/t1/twitter_more_2.bundle.css" class="moreCSSBundles">\n</head><body><div style="width:800px; margin:auto; padding:20px; background-color:#FFFFFF;">"""

f = open(infile, "r", encoding="utf-8")
f2 = open(outfile, "w", encoding="utf-8")
currentfile = [0, 1, 0, 1, 0, 1]
j = infile.find("-raw")
imgDir = infile[0:j]
imgDir = "./" + imgDir + "/images/";
lastmsgID = []
for (dirpath, dirnames, filenames) in walk(imgDir):
    imgs = filenames
for line in f:
    myData.append(line)
    j = line.find("data-message-id=")
    if j != -1:
        k = line.find("\"", j+len("data-message-id=")+1)
        lastmsgID = line[j+len("data-message-id=")+1:k]
#print("Oldest message recaptured: " + lastmsgID)
bkupinfile = "./" + RAW_BKUP_DIR + "/" + infile
if os.path.isfile(bkupinfile):
    f3 = open(bkupinfile, "r", encoding="utf-8")
    lastmsgfound = False
    newblock = False
    i = 1
    for line in f3:
        if not lastmsgfound:
            j = line.find("data-message-id=")
        if j != -1 and not lastmsgfound:
            k = line.find("\"", j+len("data-message-id=")+1)
            if lastmsgID == line[j+len("data-message-id=")+1:k]:
                lastmsgfound = True
        if lastmsgfound and not newblock:
            j = line.find("<li")
            if j != -1:
                newblock = True
            if not newblock:
                j = line.find("<div class=\"DMConversationEntry\"")
                if j != -1:
                    j = line.find("data-message-id=")
                    if j != -1:
                        k = line.find("\"", j+len("data-message-id=")+1)
                        if lastmsgID != line[j+len("data-message-id=")+1:k]:
                            newblock = True
        if lastmsgfound and newblock:
            if j > 0:
                myData.append(line[j:])
            else:
                myData.append(line)
        i += 1
        j = 0
    f3.close()
    os.replace(bkupinfile, bkupinfile + ".bkup")
f3 = open(bkupinfile, "w", encoding="utf-8")
currentfile[5] = len(myData)
for line in myData:
    f3.write(line)
    currentfile[4] += 1
    if currentfile[4] % (SCR_UPDATE_INTERVAL)  == 0:
        updatescr(filecount)
f3.close()
f.close()
i = 0
msgrec = False
rapidfire = False
while i < len(myData):
    currentfile[0] = i
    currentfile[1] = len(myData)
    nomatch = True
    j = myData[i].find("<div class=\"DMConversationEntry\"")
    if j != -1:
        myTempStack.append("div")
        myTempString2 = myData[i][j:]
        i += 1
        while len(myTempStack):
            mynomatch = True
            j = myData[i].find("</div>")
            if j != -1:
                myTempStack.pop()
                myTempString2 += myData[i][0:j+6]
                mynomatch = False
                if j == len(myData[i]) - 7:
                    i += 1
            j = myData[i].find("<div")          
            if j != -1:
                myTempString2 += myData[i][j:]  
                myTempStack.append("div")
                mynomatch = False
                i += 1
            if myData[i].find("js-action-profile js-user-profile-link") != -1:
                j = myData[i].find("href=")
                myTempString2 += myData[i][0:j+6] + "https://twitter.com" + myData[i][j+6:]
                mynomatch = False
                i += 1
            if mynomatch:
                myTempString2 += myData[i]
                i +=1
        myStack.append(myTempString2)
        myTempString = ""
        nomatch = False
    j = myData[i].find("</li>")
    if j != -1:
        myTempString2 += myData[i][0:j+5]
        myStack.append(myTempString2)
        myTempString = ""
        nomatch = False
        if j == len(myData[i]) - 6:
            i += 1
    if i == len(myData):
        break

    j = myData[i].find("<li")
    if j != -1:
        k = -1
        l = i
        m = -1
        n = -1
        while k == -1:
            k = myData[l].find("</li>")
            if n == -1:
                n = myData[l].find("is-rapidFire")
                if n != -1:
                    rapidfire = True
            if m == -1:
                m = myData[l].find("DirectMessage--received")
                if m != -1:
                    msgrec = True               
            l += 1
        if m == -1:
            msgrec = False          
        if n == -1:
            rapidfire = False
        myTempString2 = myData[i][j:]
        nomatch = False
        i += 1
    if myData[i].find("Media-photo u-chromeOverflowFix") != -1:
        myTempString = myData[i+1].split("/")
        myTempString2 += myData[i]
        for y in myTempString:
            j = y.find("jpg")
            if j == -1:
                j = y.find("png")
            if j != -1:
                imgFound = False
                for z in imgs:
                    if z.find(y[0:j+3]) != -1:
                        j = myData[i+1].find(" class=\"dm-attached-media\"")
                        myTempString2 += "<a href=\""
                        myTempString2 += imgDir
                        myTempString2 += z
                        myTempString2 += "\""
                        myTempString2 += myData[i+1][j:]
                        imgFound = True
                if not imgFound:
                    myTempString2 += myData[i+1]

        i += 2;
        nomatch = False
    elif myData[i].find("_timestamp") != -1:
        while myData[i].find("</span>") == -1:
            j = myData[i].find("data-time=")
            if j != -1:
                absTime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(int(myData[i][j+len("data-time=")+1:j+len("data-time=")+11])))
            j = myData[i].find(">")
            if j != -1:
                myTempString2 += myData[i][0:j+1] + "\n" + absTime + "\n"
            j = myData[i].find("</span>")
            if j != -1:
                myTempString2 += myData[i][j:]
            i += 1

    elif myData[i].find("FlexEmbed-item u-borderRadiusInherit") != -1:
        myTempString = myData[i+1].split("/")
        myTempString2 += myData[i]
        for y in myTempString:
            j = y.find("jpg")
            if j == -1:
                j = y.find("png")
            if j != -1:
                imgFound = False
                for z in imgs:
                    if z.find(y[0:j+3]) != -1:
                        myTempString2 += "<img src=\""
                        myTempString2 += imgDir
                        myTempString2 += z
                        myTempString2 += """" alt="" data-full-img=\""""
                        myTempString2 += imgDir
                        myTempString2 += z
                        myTempString2 += "\">"
                        imgFound = True
                if not imgFound:
                    myTempString2 += myData[i+1]
                break
        i += 2
        nomatch = False
    elif myData[i].find("QuoteTweet-container") != -1:
        myTempString2 += myData[i]
        j = myData[i+1].find("href=")
        myTempString2 += myData[i+1][0:j+6] + "https://twitter.com" + myData[i+1][j+6:]
        i += 1
        nomatch = False
    elif myData[i].find("QuoteTweet-innerContainer") != -1:
        myTempString2 += myData[i]
        myTempString2 += myData[i+1]
        myTempString2 += myData[i+2]
        myTempString2 += myData[i+3]
        myTempString2 += myData[i+4]
        j = myData[i+5].find("href=")
        myTempString2 += myData[i+5][0:j+6] + "https://twitter.com" + myData[i+5][j+6:]
        i += 5;
        nomatch = False
    elif myData[i].find("js-macaw-cards-iframe-container") != -1:
        j = myData[i].find("initial-card-height")
        if j != -1:
            myData[i] = myData[i][0:j] + myData[i][j+len("initial-card-height")+1:]
        while myData[i].find(">") == -1:
            j = myData[i].find("data-src")
            if j != -1:
                myTempString2 += myData[i][0:j+10] + "http://twitter.com" + myData[i][j+10:]
                myTempString = "http://twitter.com" + myData[i][j+10:-1]
            else:
                myTempString2 += myData[i]
            i += 1
        myTempString2 += myData[i]

        if myTempString.find("cardname=player") != -1:
            myTempString2 += """<iframe id="xdm_default6210_provider" style="display: block; margin: 0px; padding: 0px; border: 0px none; width:100%; height: 126px;" scrolling="no" src=\""""
            myTempString2 += myTempString[:-1]
            if msgrec:
                #print("msg received")
                if rapidfire:
                    #print("rapidfire")
                    myTempString2 += """&amp;client=dm&amp;border_radius=false%2Ctrue%2Ctrue%2Cfalse&amp;edge=true&amp;card_height=125&amp;night_mode="""
                else:
                    myTempString2 += """&amp;client=dm&amp;border_radius=true%2Ctrue%2Ctrue%2Cfalse&amp;edge=true&amp;card_height=125&amp;night_mode="""    
            else:
                if rapidfire:
                    myTempString2 += """&amp;client=dm&amp;border_radius=true%2Cfalse%2Cfalse%2Ctrue&amp;edge=true&amp;card_height=125&amp;night_mode="""
                else:
                    myTempString2 += """&amp;client=dm&amp;border_radius=true%2Ctrue%2Cfalse%2Ctrue&amp;edge=true&amp;card_height=125&amp;night_mode="""
        elif myTempString.find("cardname=summary_large_image") != -1:

            myTempString2 += """<iframe id="xdm_default6210_provider" style="display: block; margin: 0px; padding: 0px; border: 0px none; width:100%; height: 327px;" scrolling="no" src=\""""
            myTempString2 += myTempString[:-1]
            if msgrec:
                #print("msg received")
                if rapidfire:
                    #print("rapidfire")
                    myTempString2 += """&amp;client=dm&amp;border_radius=false%2Ctrue%2Ctrue%2Cfalse&amp;edge=true&amp;card_height=327&amp;night_mode="""
                else:
                    myTempString2 += """&amp;client=dm&amp;border_radius=true%2Ctrue%2Ctrue%2Cfalse&amp;edge=true&amp;card_height=327&amp;night_mode="""    
            else:
                if rapidfire:
                    myTempString2 += """&amp;client=dm&amp;border_radius=true%2Cfalse%2Cfalse%2Ctrue&amp;edge=true&amp;card_height=327&amp;night_mode="""
                else:
                    myTempString2 += """&amp;client=dm&amp;border_radius=true%2Ctrue%2Cfalse%2Ctrue&amp;edge=true&amp;card_height=327&amp;night_mode="""
        else:
            myTempString2 += """<iframe id="xdm_default6210_provider" style="display: block; margin: 0px; padding: 0px; border: 0px none; width:100%; height:90px;" scrolling="no" src=\""""
            myTempString2 += myTempString[:-1]
            if msgrec:
                if rapidfire:
                    myTempString2 += """&amp;client=dm&amp;border_radius=false%2Ctrue%2Ctrue%2Cfalse&amp;edge=true&amp;card_height=89&amp;night_mode="""
                else:
                    myTempString2 += """&amp;client=dm&amp;border_radius=true%2Ctrue%2Ctrue%2Cfalse&amp;edge=true&amp;card_height=89&amp;night_mode=""" 
            else:
                if rapidfire:
                    myTempString2 += """&amp;client=dm&amp;border_radius=true%2Cfalse%2Cfalse%2Ctrue&amp;edge=true&amp;card_height=89&amp;night_mode="""
                else:
                    myTempString2 += """&amp;client=dm&amp;border_radius=true%2Ctrue%2Cfalse%2Ctrue&amp;edge=true&amp;card_height=89&amp;night_mode="""
        if night:
            myTempString2 += "true"
        else:
            myTempString2 += "false"
        myTempString2 += """&amp;scribe_context=%7B%22client%22%3A%22web%22%2C%22page%22%3A%22me%22%2C%22section%22%3A%22profile%22%2C%22component%22%3A%22dm_existing_conversation_dialog%22%7D&amp;bearer_token=AAAAAAAAAAAAAAAAAAAAAPYXBAAAAAAACLXUNDekMxqa8h%252F40K4moUkGsoc%253DTYfbDKbT3jJPCEVnMYqilB28NHfOPqkca3qaAxGfsyKCs0wRbw#xdm_e=https%3A%2F%2Ftwitter.com&amp;xdm_c=default6210&amp;xdm_p=1" allowfullscreen="" width="100%" height ="100%" frameborder="0"></iframe>""" 
        nomatch = False
        i += 1
    elif myData[i].find("js-action-profile js-user-profile-link") != -1:
        j = myData[i].find("href=")
        myTempString2 += myData[i][0:j+6] + "https://twitter.com" + myData[i][j+6:]
        nomatch = False
        i += 1
    elif nomatch:
        myTempString2 += myData[i]
        i += 1
    if i % (SCR_UPDATE_INTERVAL)  == 0:
        updatescr(filecount)

updatescr(filecount)
i = 0
f2.write(myString)
currentfile[3] = len(myStack)
while myStack:
    f2.write(myStack.pop())
    if currentfile[2] % (SCR_UPDATE_INTERVAL) == 0:
        updatescr(filecount)
    currentfile[2] += 1
updatescr(filecount)
myString = "</div></body>\n</html>"
f2.write(myString)
f2.close();
return

def progress(count, total, ypos, status=''):

bar_len = 50
filled_len = int(round(bar_len * count / float(total)))

percents = int(round(100.0 * count / float(total), 0))
percents = str(percents) + '%'

percentspos = int(round((bar_len - len(percents))/2, 0))
bar = str(' ' * percentspos) + percents + str(' ' * (bar_len - len(percents) - percentspos))
if filled_len > 0:
    bar = "\033[37;44;1m" + bar[0:filled_len] + "\033[30;41m" + bar[filled_len:]
else:
    bar = "\033[30;41m" + bar[0:filled_len] + "\033[30;41m" + bar[filled_len:]
print('\033[' + str(ypos) + ';1H', '%s\033[0m %s\033[K' % (bar, status))

init()
main()

Mincka commented 6 years ago

Very nice, works great. 👍

Don't you want to post it on your GitHub account? I could reference it from this repo.

lonegreyhat commented 6 years ago

Glad that works for you. Github... lol... I'll see what I can do. I'm sorry if I'm telling you what you already know, but dmarchiver doesn't actually get gifs and mp4s. I have some thoughts on this if you ever have a moment to chat...

On Sat, Dec 30, 2017 at 8:15 AM, Julien Ehrhart notifications@github.com wrote:

Very nice, works great. 👍

Don't you want to post it on your GitHub account? I could reference it from this repo.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Mincka/DMArchiver/issues/44#issuecomment-354545730, or mute the thread https://github.com/notifications/unsubscribe-auth/AgT7YSsYp3Nrnghotcre1ICOdiydN4Jbks5tFjdtgaJpZM4QmkTV .

Mincka / DMArchiver

Internal Twitter error 131 #44

!"c:\program files (x86)\python35-32\python.exe"

This software is distributed under GPL v3

Copyright 2017 @lonegreyhat