TumblThreeApp / TumblThree

A Tumblr and Twitter Blog Backup Application
https://TumblThreeApp.github.io
MIT License
623 stars 75 forks source link

Tumblr Changed Privacy Consent Agreement to New (External) Service -- Breaks Login, Tumblr, SVC, Tumblr Search and Likes Downloading. #81

Closed johanneszab closed 3 years ago

johanneszab commented 4 years ago

Tumblr changed the GDPR privacy consent provider last week and seems to be now using some external services for this. This results in new queries and cookies. As a result, for me, as EU resident, the TumblThree Tumblr-search- and TumblThree likes/by-Crawler are non-functional. Can someone from the US (or Non-EU) test if it still works for them?

For example, can you download anything from these? https://www.tumblr.com/liked/by/wallpaperfx https://www.tumblr.com/search/cars

ErikBrown2 commented 4 years ago

Hi,

I live in Asia so I tried the two links. The liked by link results in TumblThree asking me to log-on to Tumblr.com. So I entered my username and password as usual in the settings screen. Then under where I had entered my username and password in the settings screen, the following text and a textbox appeared: TFA detected, enter auth code

It is not clear where I should get this auth code from. I have checked my mobile phone to see if I received something by SMS but this was not the case. I also did not receive anything by email.

The search link did not result in the log-in request message. The entry in the queue became blue for about four seconds and then disappeared from the queue. But the resulting folder is empty (no files downloaded) and also the blog entry for it in the left side blog list does not show any figure for 'downloaded files' nor for 'number of downloads'.

ErikBrown2 commented 4 years ago

I noticed your comment about trying to obtain another IP address from the ISP to solve the log-on issue. But this does not solve it for me. Also after rebooting my router and getting another IP address, TumblThree is still asking me for the auth code.

I can log-in to Tumblr.com on my webbrowser. Also after doing this, still TumblThree is asking for the auth code.

johanneszab commented 4 years ago

That's the same I experience. Thanks for testing it outside of the EU! Then it's not just a "new thing" for EU people having to agree to.

Pretty sure however, that the Tumblr search- and Tumblr likes-downloading (still) worked 10 days ago when I fixed both searches in #75.

ErikBrown2 commented 4 years ago

About this new auth code, where should I get that from? Or does that also need a fix in the software?

ErikBrown2 commented 4 years ago

About this new auth code, I just noticed that I used, without realizing it, two different setups of TumblrThree application files on my laptop. To test the privacy issue I had unzipped the installation files in a certain folder and ran it. Trying to log in with that set of file caused the auth code request. Later I realized that I had installed before the TumblrThree application on another folder and this application copy was already logged on from before when I checked it, just now.

So can it be that the auth request is only a result of having two sets of application files on one PC ?

wacher74 commented 4 years ago

I have only one instance (I had never other one), but TFA problem exists here, too.

johanneszab commented 4 years ago

The TFA questions pops up now because if you perform a direct login on a fresh browser session, there will be a privacy consent agreement popup directly after the login and TumblThree thinks Tumblr asks for a TFA code. The authentication process however seems to not have been changed. Usually TumblThree performed the privacy consent agreement at startup in the background. This is broken now however

I've very briefly looked into this, and not sure when/if I'll have time to fix this, but if someone wants to look into this:

The first GET goes to https://www.tumblr.com/privacy/consent/begin. If you click the agree button, another GET to https://www.tumblr.com/privacy/consent/complete is performed with a shitload long query:

{
    "GET": {
        "scheme": "https",
        "host": "www.tumblr.com",
        "filename": "/privacy/consent/complete",
        "query": {
            "{\"cmpId\":10,\"cmpVersion\":11,\"gdprApplies\":true,\"tcfPolicyVersion\":2,\"eventStatus\":\"useractioncomplete\",\"cmpStatus\":\"loaded\",\"tcString\":\"CO4tdkLO4tdkLAKALAENA0CsAP_AAH_AACiQGYtd_X9fb2vj-_5999t0eY1f9_63v-wzjgeNs-8NyZ_X_L4Xr2MyvB34pq4KmR4Eu3LBAQVlHGHcTQmQwIkVqTLsak2Mq7NKJ7JEilMbM2dYGG1Pn8XTuZCY70_sf__z_3-_-___67YGXkEmGpfAQJCWMBJNmlUKIEIVxIVAOACihGFo0sNCRwU7K4CPUACABAYgIwIgQYgoxZBAAAAAElEQAkAwIBEARAIAAQArQEIACJAEFgBIGAQACoGhYARRBKBIQZHBUcogQFSLRQTzRgAA.cAAAAAAAAAAA\",\"isServiceSpecific\":true,\"useNonStandardStacks\":false,\"purposeOneTreatment\":false,\"publisherCC\":\"US\",\"purpose\":{\"consents\":\"1111111111\",\"legitimateInterests\":\"0111111111\"},\"vendor\":{\"consents\":\"1101011101111111010111111101011111011011110110101111100011111110111111111001111101111101111101101101110100011110011000110101011111111101111111111010110111101111111110110000110011100011100000011110001101101100111110111100001101110010011001111111010111111111001011111000010111101011110110001100110010101111000001110111111000101001101010111000001010100110010001111000000100101110110111001011000001000000010000010101100101000111000110000111011100010011010000100110010000110000001000100100010101101010010011001011101100011010100100110110001100101010111011001101001010001001111011001001000100100010100101001100011011001100110110011101011000000110000110110101001111100111111100010111010011101110011001000010011000111011110100111111101100011111111111111111110011111111110111111110111111111110111111111111111111111010111011011\",\"legitimateInterests},\"specialFeatureOptins\":\"11\",\"publisher\":{\"consents\":\"1\",\"legitimateInterests\":\"\",\"customPurpose\":{\"consents\":\"\",\"legitimateInterests\":\"\"},\"restrictions\":{}},\"x-tumblr-nonIabVendorConsents\":{\"1\":true,\"2\":false,\"3\":false,\"4\":false,\"5\":true}}": ""
        },
        "remote": {
            "Address": "152.199.21.147:443"
        }
    }
}

This tcString seems to change. In the response of the first call there is some js included which might generate this string (see the POST to api/v2/privacy/consent and the GET builder including consent/complete?' + encodeURIComponent( JSON.stringify( inAppTCData ))):

function saveConsent() {
  __tcfapi('getInAppTCData', 2, ( inAppTCData, success ) => {
    if ( ! success ) {
      console.error( inAppTCData );
      return;
    }

    __tcfapi('getNonIABVendorConsents', 2, function(consent, success) { 
      if ( success 
          && consent.gdprApplies
          && consent.nonIabVendorConsents ) {
          inAppTCData['x-tumblr-nonIabVendorConsents'] = consent.nonIabVendorConsents;
      }

      log('saving consent');
      var save = submitToApi ? 
        fetch('/api/v2/privacy/consent', {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
            'X-CSRF': 'Si8dbo9UfHWj.1598383626'
          },
          body: JSON.stringify({
            "tcfv2_consent": JSON.stringify(inAppTCData)
          })
        })
        : Promise.resolve({ ok: true });

      save.then( function redir( response ) {
        if ( ! response.ok ) {
          console.error(response);
          throw response;
        }
        log('consent saved', response);
        if (shouldRedirectOnConsent) {
          if (redirectOnConsentUrl) { 
            log('redirecting to', redirectOnConsentUrl);
            window.location = redirectOnConsentUrl;
          } else {
            var nativeUrl = 'https://www.tumblr.com/privacy/consent/complete?' + encodeURIComponent( JSON.stringify( inAppTCData ));
            log('redirecting to', nativeUrl );
            window.location = nativeUrl;
          }
        } else {
          log('not redirecting');
        }
      });
[..]
JayBee1954 commented 4 years ago

I have been a user of Tumblthree since it's inception, having used Tumbltwo previously. But it is now completely unusable, as I cannot log on. It wants a TFA authorisation code. I have never set up TFA, and so there is no code that I can enter!! I have tried all ways of getting around this problem, and also tried using different browsers as my default, but it still does not work. Even changing the user agent settings in Tumblthree does not make any difference. Pity, as it has been a brilliant program up until now.

ErikBrown2 commented 4 years ago

Did you use your webbrowser to log on to Tumblr? I used Chrome to do that after I got the TFA request in TumblThree and when I restarted TumblThree, it logged on without a problem. Maybe I was lucky but maybe this is the way around the issue.

wacher74 commented 4 years ago

Did you use your webbrowser to log on to Tumblr? I used Chrome to do that after I got the TFA request in TumblThree and when I restarted TumblThree, it logged on without a problem. Maybe I was lucky but maybe this is the way around the issue.

I tired it a minutes ago, it doesn't help (Chrome, new Edge). Chrome: new login. Edge: logout+login. I had to accept the blabla, site is working. After this I started TT, but it detects still TFA.

bloke90 commented 4 years ago

same for me, most blogs can't be crawled, error 1 or error 2 or no error just "evaluating" then nothing happens....?

I'm using latest 11.8 - any chance for a bug fix?

PHiZiK commented 4 years ago

Just when I found out about this amazing tool...It became unusable. Today I downloaded the software and tried to authenticate but always getting the TFA message. hopefully it's a solvable problem. Thank you developer for your hard work.

jasonhouston commented 4 years ago

Did you use your webbrowser to log on to Tumblr? I used Chrome to do that after I got the TFA request in TumblThree and when I restarted TumblThree, it logged on without a problem. Maybe I was lucky but maybe this is the way around the issue.

I tired it a minutes ago, it doesn't help (Chrome, new Edge). Chrome: new login. Edge: logout+login. I had to accept the blabla, site is working. After this I started TT, but it detects still TFA.

This is exactly my issue as well.

bloke90 commented 4 years ago

update: works again with another internet connection: mobile isn't the best idea but dowdloads do start without Error 1 or 2

Edit: not for the hidden blogs!!!

bloke90 commented 4 years ago

edit new: once again the earlier working blogs jump out with api-settings error exceeded number of connections etc.?

bloke90 commented 4 years ago

looks like a daily maximum exceeded? Today it works again.... (non-hidden blogs)

bloke90 commented 4 years ago

When I close the application and restart, the blogs are still there (list from index folder) but the queue will disappear soon as the tags to download picutures, videos etc. are not set anymore - need to be set manually.... is this bug known?

ghost commented 4 years ago

ThumblThree is unusable for me too since some weeks. I also get the TFA-question always when I try to login.

I've tested some things:

:(

johanneszab commented 4 years ago

Like I said in post #8, someone has to implement the new Privacy Consent Agreement in TumblThree in order to get the proper cookies. Without them, the login will fail, and hence downloading anything except from the old Tumblr Api as well. The Privacy Consent Agreement looks way more complicated then it was before (2 simple XHR and a simple GET). Now they are doing async promises within some react/js in the background, with hashings, timestamps, requests to some external provider (quantcast.mgr.consensu.org/) and what not. Maybe it's not even possible without executing the actual js.

I've trying using the old Internet Explorer (because C# .NET Framework has an API for handling it and reading its cookies) to agree to the Privacy Consent, but the JavaScript/React didn't execute because the JS is too new for the IE.

Alternatively one could extract cookies from chrome, but then again you have to hassle around there, there is no real API for that, it looks like they even changed recently (version 80) how they stored their cookies, they are of course encrypted, and we force everyone to use chrome.

One could even try to use selenium (browser automation) to do the login, and then extract the cookies. I'm not sure about this. It's not really elegant, but maybe the easiest solution.

It looks like MS even released a new EDGE browser control for C#. I haven't used it, thus I cannot really say anything about it. And of course, it will not work for pre-Windows 10 users.

I'm really tired of this project and will not do it myself. If no ones is interested in doing this within the next month or so, I'll probably archive this project so that the code is still available, but people cannot post anything here anymore.

ErikBrown2 commented 4 years ago

Sorry to read that it is so complex to solve. Hopefully there will be a volunteer who is able to fix it. TumblThree is a very nice program and it would be a pity if we will not be able to use it anymore.

Hi-ImKyle commented 4 years ago

What cookies are missing exactly? euconsent-v2 is the one that is mentioned here, and if that's the only one that is required then a simple basic fix, for the time being, is just getting the user to input their cookie before attempting to log in so the cookie is passed with the rest of the requests.

I'm not that great at implementing new things into projects I don't work on, so I'm not sure where to begin when it comes to even trying to implement what I'm thinking of as a temp fix. Could someone point me to where this cookie would be required in code please?

bloke90 commented 4 years ago

I still get requently "exceed" error message after couple of hundreds of download - only works with non-hidden blogs. To use VPN and connect to different country works for a while.....

Where do the "exceed" errors come from?

calves07 commented 4 years ago

@bloke90 with VPN, can you download posts from hidden blogs?

thomas694 commented 4 years ago

First of all, thanks a lot to Johannes for such a great app. I'm using TumblThree for a while and I was unhappy that it no longer worked. Despite not having much time, I've given it a shot and here are my results.

Notes:

Volunteers for correct integration are welcome.

How to use: The same as the old Authenticate window. After loging in and closing the browser window, its cookies are transferred to the internal cookie collection. As you successfully logged in, you can use TumblThree as before again. On my system the new app works as perfect as before.

Comments and contributions are welcome.

Branch for review: https://github.com/Thomas694/TumblThree/tree/81-workaround-broken-login-with-changed-privacy-consent-agreement

johanneszab commented 4 years ago

Hi Thomas!

Thanks alot for your contribution and willing to work on this project! Thats wonderful.

Im currently in vacation and don't have my computer around, but if you want to, I might take a look this sunday. However, like i said. Any contributions are greatly appreciated, and as long as its works its great and better than no contributions at all ;)

calves07 commented 4 years ago

I tried the version with the changes implemented by Thomas and it worked as expected! And now, my account is also logged in the official version (with the broken authentication) so both versions are working properly now :)

RedneckEngineering commented 4 years ago

Can verify the solution works like a champ!

RawRanger commented 4 years ago

Would you guys maybe describe exactly how you implemented those changes? I cannot make it work and coding is not exactly my strong suit. I would really appreciate it.

RedneckEngineering commented 4 years ago

@RawRanger - If you grab Visual Studio and install the community version, you can build a debug version of the application. You will want to head to thomas694's GitHub (in his post) and grab a zip file of the code, then follow the directions (How To Build The Source Code To Help Further Developing). Make sure you build the X64 debug version. Once you get it built, you can run it from within Visual Studio. Run the app, go to settings, and use the new login button (below where it used to be). This will open a new window, where you will log into Tumblr using your userid and pwd. Once you log in, close this window and you'll see the app shows you logged in. If you performed this on your main machine, you can run the original tumblthree app, and it will grab your cookies. If you used a different machine, head to where tumblthree stores the config files (For me, it's: C:\Users\user\AppData\Local\TumblThree\Settings) and grab the cookies.json file. Move the cookies.json to the same location on your other machine, and it'll work.

ErikBrown2 commented 4 years ago

@RedneckEngineering

Thanks for your explanation. I tried to follow your steps but I did not find how to use the zip file of the code. A Youtube clip showed that I should select 'Get started' - 'Clone a repository' and then put the link to the project in Visual Studio as the repository location. I used 'https://github.com/thomas694/TumblThree' for the repository location. I could run the application in debug but this appeared to give me the original version from Johanneszab with the original log-in button and the same login error. Can you please let me know which link I should use as the repository location when adding the new project in Visual Studio to get the updated version from Thomas?

Hi-ImKyle commented 4 years ago

Can you please let me know which link I should use as the repository location when adding the new project in Visual Studio to get the updated version from Thomas?

@ErikBrown2, your problem is that you are cloning the master branch of his fork, you need to clone the 81-workaround-broken-login-with-changed-privacy-consent-agreement branch instead. Try using this link instead of the one you used: https://github.com/thomas694/TumblThree/tree/81-workaround-broken-login-with-changed-privacy-consent-agreement

Git command for those who use the terminal for git instead of VS git clone -b 81-workaround-broken-login-with-changed-privacy-consent-agreement https://github.com/thomas694/TumblThree.git

johanneszab commented 4 years ago

You can also wait until sunday when I'm back from my vacation, then I'll merge Thomas changes and release a new version. Then you don't have to download and setup Visual Studio yourself.

RawRanger commented 4 years ago

That's even better. Yes, I will wait for it. Saves me a lot of frustration. Thanks!

ErikBrown2 commented 4 years ago

That's even better. Yes, I will wait for it. Saves me a lot of frustration. Thanks!

Me too. Thanks for the information, though

bloke90 commented 4 years ago

@bloke90 with VPN, can you download posts from hidden blogs?

no wit VPN I can deviate the "exceeded" situation - again afte some 3000 (?) downloads it says "exceeded" - new VPN location and it runs again - I suppose this is IP related

calves07 commented 4 years ago

Totally unrelated, but does anyone know a way to backup chat messages (DMs)?

RawRanger commented 4 years ago

Not for sure, but I'd try Extreme Picture Finder 3. It works with templates that are at your disposal via an online database. Some scripts come with the application itself, but most are created by its users.

A user can create an entire new template and add to the database, or they can make use of an already existing template and modify it so it can be used for their specific needs. It already has many Tumbler templates for all sorts of purposes, so there might be one right for your purpose.

I've included a pdf which shows you part of such a library (a library I personally never make use of .... obviously).

The program really gives your a lot flexibility by allowing you to edit existing scripts.

The best part for me personally is that is relatively simple to edit scripts and then testing them, even for me, whereas here on Github, where I depend on Johanneszab to do yhe "dirty work" for me. Something I am truly thankful for.

ExtremePictureFinder.pdf

johanneszab commented 4 years ago

Thanks Thomas for all the work! I've just merged your changes and added some additional code changes in order to be able to save the Cef-cookies.

Now when I wanted to release the changes, I remembered the problem with cef ;-). It's a chromium packed in a .dll with a file size of 80+MB. Any ideas on how we should release this? Even zipped its still 74MBs.

johanneszab commented 4 years ago

This new Wpf.UI.Controls.WebView from Microsoft sounds kinda nice!

This library provides WebView XAML control for WPF by hosting web content in your WPF desktop application. It is part of the Windows Community Toolkit.
     This control uses the Microsoft Edge rendering engine (EdgeHTML) or the the System.Windows.Controls.WebBrowser, for devices on older versions (WebViewCompatible), to embed a view that renders richly formatted HTML5 content from a remote web server, dynamically generated code, or content files.

It's probably tiny compared to the libcef.dll, but as I mentioned, not sure if it will work on pre-Windows10 systems. ... Maybe It's still worth a quick try at some point to see how it perfoms..

johanneszab commented 4 years ago

Well, in the meantime, I've uploaded a release with Thomas changes on my website: TumblThree_f5fa8d759c5082580a8502de59410741c28def0a.zip.

To login, go to Settings->Connections, press the Authenticate button. A browser embedded window will open with the Tumblr login page opened. After logging in using the password method, also perform the new privacy consent agreement. When you're back on the Tumblr start page simply close the browser window.

wacher74 commented 4 years ago

TumblThree thinks my blog is offline, but this is not true. Popup login page ok, I see my dashboard within it. But the crawling is not working, app says: Status offline. (Open in browser, Open in tubex.com are working, new login windows has been taken succesfully)

thomas694 commented 4 years ago

This new Wpf.UI.Controls.WebView from Microsoft sounds kinda nice!

This control uses the Microsoft Edge rendering engine (EdgeHTML) or the the System.Windows.Controls.WebBrowser, for devices on older versions (WebViewCompatible), ...

It's probably tiny compared to the libcef.dll, but as I mentioned, not sure if it will work on pre-Windows10 systems. ... Maybe It's still worth a quick try at some point to see how it perfoms..

"Unlike WebView, WebViewCompatible uses one of two rendering engines to support a broader set of Windows clients:

On Windows 10 devices, the newer Microsoft Edge rendering engine is used to embed a view that renders richly formatted HTML content from a remote web server, dynamically generated code, or content files. On devices running older versions of Windows, the System.Windows.Controls.WebBrowser is used, which provides Internet Explorer engine-based rendering." (WebViewCompatible control for Windows Forms and WPF)

If it falls back on older windows versions and uses WebBrowser control, we have the same as before. A control which cannot display the new Tumblr site.

" Note The Edge runtime does not at the moment work when the process is elevated as an administrator. Therefore WebViewCompatible will fall back to use the System.Windows.Controls.WebBrowser when it detects that the process is running as administrator." (same link above)

As stated before that was an exclusion criterion to me, because many people are using their system with an admin account and/or have the UAC set to a non-default level. At least I do.

Maybe in WebView2 they'll have fixed this problem. But they haven't officially released it yet and I don't like to fiddle around with that preview stuff. So I'm looking forward to the final release of WebView2 ...

Cef can be recompiled with two different flags, but a quick summary of what they are writing in the readme and their forum, only about 20MB (compressed maybe 10MB) is the difference. That wasn't worth trying.

ErikBrown2 commented 4 years ago

It works great again. Just a very minor cosmetic issue, the version number is not updated. It still shows 1.0.11.8 as the version. Thanks a lot.

ghost commented 4 years ago

I've tried the new release....

But after starting that release it immediately shows the following error message (in the light blue line above the list):

Error 1: Could not restore user interface settings.

Anything seems to be fine to this point, all settings are as expected. But after clicking the Authenticate-button in Settings->Connection, it takes two or three seconds and TumblThree closes completely without any information why.

That happened on a fresh Windows 10 Pro installation with all updates and the newest Edge + newest Chrome + newest Vivaldi installed.

thomas694 commented 4 years ago

You can try the following, maybe it helps. Go to your settings.json and delete the ColumnSettings section (of course not without a backup file). Then start your program again. Maybe you have to rearrange your columns if you used a different order and/or size.

ghost commented 4 years ago

Thanks for the help... but just one second before your post I've tried to start TumblThree as Administrator, which solved the problem restoring the user interface settings.... ;)

But again no success with the Authenticate-button. Not as user, not as Administrator.

nomead42 commented 4 years ago

The prepackaged build works great, Windows 7 x64. Thank you very much for everyone involved.

In this age of ever increasing bloat and even printer drivers taking hundreds of MB, is 74 MB really so bad? A couple projects I've used (but haven't worked on) with sizable math libraries have three options for the download.

  1. Everything included, "no hassle" first installation, or for people who don't care about the download size.
  2. Everything else except the big libraries (in this case it would be easy, a single DLL file)
  3. The big libraries in a separate package, or in some cases a pointer to where to get it.

Now that MS owns GitHub, I'd like to think they have the bandwidth even if the full package is the only release option. Some video files on Tumblr are bigger than that. When the program is used for its intended purpose, I'd assume that a good fast network connection is available, So an occasional 74 MB doesn't make much of a difference.

wacher74 commented 4 years ago

Agree, I think If you use a program for downloading tons of megabytes then you won't be angry about some extra 74MB :) Do somebody have any idea why tells the TumblThreeApp for every blog is OFFLINE what I added? (I'm logged in, autentication was okay)

The program doesn't exit properly. For example I can't delete debug.log file. (Ther last downloaded file is not closed also, this is an old issue.) I have to kill a process which eat 7% CPU, 140M RAD until it is killed.

Some experiment/experience:

johanneszab commented 4 years ago

TumblThree thinks my blog is offline, but this is not true. Popup login page ok, I see my dashboard within it. But the crawling is not working, app says: Status offline. (Open in browser, Open in tubex.com are working, new login windows has been taken succesfully)

I don't have that much time currently during the weeks, hence I've only briefly looked at Thomas changes, they looked good, and tested the SVC Tumblr Blog Crawler implementation afterwards, which is fixed now. That was the one which was broken without the privacy agreement cookies.

However, after your comment yesterday I've briefly checked if the API Tumblr Blog crawler still works, I simply assumed it would after getting all cookies with chrome. However, for some reason it returns some HTML now with the changes instead of the API Request data as json.

I've briefly tested to just supply no cookies at all for the API requests which seems to be working. You can even open a fresh browser (incognito tab) and perform the request manually and get the data. It doesn't really make that much sense why this request wouldn't work with all the cookies, just as it used to be.

I think we just have to sort out the cookies now. I think we have everything we need, but just have to adjust all the requests and supply the proper cookies to each of them.

I'm guessing that the search requests are broken too, because there is no pfg cookie anymore. You can see you stored cookies in %localappdata%\TumblThree\Settings. Simply paste that into your explorer addressbar and open the cookies.json

So, maybe you just have to wait a little long now for progress. I'm already thankfully for Thomas help! He is also new to this code base, hence he still has to figure things out.

RawRanger commented 4 years ago

Totally agree. Thanks for the work, Thomas! Works great again.