godbout / AlfredKat

the infamous alfred-kat but in Swift because macOS is getting rid of the PHP interpreter.
MIT License
32 stars 0 forks source link

Workflow doesn't work with certain URLs #17

Open yourmomdatestedcruz opened 1 year ago

yourmomdatestedcruz commented 1 year ago

Hi - I'd like to use https://kat.rip/ instead of the default, but all the results return as 404 - is there a step I'm missing here?

godbout commented 1 year ago

getting the same result as you here. will check. thanks.

godbout commented 1 year ago

the issue happens when scrapping the site. for whatever reason kat.rip doesn't use the same CSS classes than the default site.

Alfred KAT gets the rows from the .frontPageWidget tr table. that table uses different CSS selectors on kat.rip. so it's a bit of a mess. not sure what's the best way to make KAT works on all sites if they don't follow the same structure.

godbout commented 1 year ago

ok so it's even worse with kat.rip.

you end up on a page that dynamically loads the stuff later. so you can't scrap it. this is the HTML received by the parser:

<html lang="en-US">
 <head> 
  <title>Just a moment...</title> 
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> 
  <meta http-equiv="X-UA-Compatible" content="IE=Edge"> 
  <meta name="robots" content="noindex,nofollow"> 
  <meta name="viewport" content="width=device-width,initial-scale=1"> 
  <link href="/cdn-cgi/styles/challenges.css" rel="stylesheet"> 
  <meta http-equiv="refresh" content="35"> 
 </head> 
 <body class="no-js"> 
  <div class="main-wrapper" role="main"> 
   <div class="main-content"> 
    <noscript> 
     <div id="challenge-error-title"> 
      <div class="h2"> 
       <span class="icon-wrapper"> 
        <div class="heading-icon warning-icon"></div> </span> 
       <span id="challenge-error-text"> Enable JavaScript and cookies to continue </span> 
      </div> 
     </div> 
    </noscript> 
    <div id="trk_jschal_js" style="display:none;background-image:url('/cdn-cgi/images/trace/jsch/nojs/transparent.gif?ray=7dc6341f3db0b014')"></div> 
    <form id="challenge-form" action="/search/fi?__cf_chl_f_tk=iPfHMW7cMAPN6Lk0OU_712BHqug72vZcm8Yvry5l7ZA-1687622635-0-gaNycGzNCRA" method="POST" enctype="application/x-www-form-urlencoded"> 
     <input type="hidden" name="md" value="tMKgME7bdedN0s7pWXM1wSAgwQnfdPHc.DxMHvN359o-1687622635-0-ARpPIxAH-M5MOKqkPBY2iAAtyiV-Wf9VO0hUJuiPkI9h4LbLHCrgLJgtP5C_6eqzsJbTQwV-ofirjwLJPhxyEac-Fe7cJEN17C7S1Fgr5Egbt8kspmsD4_daPJACF5LpjDsIaEKpZihIcbrzLnSZoUyZxWGzJxWStDkC-tKIzgYxlCGBUS7F4Y100iLBW8E8Yo-BTTfjVEJktW7xa_QcoyOmj75R-b7md7Ueq3DjK8MxhobFxWJAoSpCsSH0sVJ8v7fIwLN4PloLRkpkgX-iL0lLW_Z6G22H5QvOX5WzuUC2Dm7-m8hqYyxTFL8KZfR8Gg_08TOqGO-zRz7pkB84nnBRFLcffHTyXY__MAxo4rZ6_blOqLUBxJ32I_MrGkyGqzspaI64cW66LgI5r1OjHvdS6Qd0fzFLMZPZHhKKfdSlTLbBhpCO3XaPkdIRwfhE8D0erUQb16gAZ_bUAX4p_vUmCjjNqflRlz0Navu2qmvdXL36emwZsJmQ5uQc1Zr7mAyZ6RAuan62zNa26y4gkmbC3WpKR0KSEgeeKhoBpvf_eTQmvzjbekViC-DSCGzh--z9K650ejJK1bBrSJUeNwjtj3EcGb0ZGQq1iABLX-RP9Jsf_YxlOxU6ZGvx2ts6moUhuONGQagTJeE5Fbb2DmpCMY4DTa3eeCcMNl-ElO0zy03mJcXhsvSK4biFqVaQCiLcvykrubxpv3OmddvIx3YqOlT00LzDh8ccFgZ2N85sw62SAC1J4-0uzJFgNwI23vHr2jCs4_uakRXWkG6MZjmefLX0IuE9a1pYZHZQ4yfCPGeFUgpKXBETQxSbm97YDu-3JKe70lGU9TrNCp8mSvUCH6moD7Y9LRtlv1jNuZDleBWRIpV9vofW2vkmr8gKJGkrSpKe-2f41yG371ow4flMH7phQ4S53rnaBE7Uv4EnNVwi9cB1u04hM3tDmmHbE00oyKrS7ULeHQVBSWPOFlPWLkNwp-NcBlDM-qLol-6ftpHxKaq992BwtqKggELf5gFZZx1JesOmD9TxggK-xHYlKob4m6UtuNM4lTWFQkMqwQWfXl1Sq_4ymhjZNdm2qmjF09tLqMT6gd7O8TZtA1ENxgPzhURiu5AIeV6COGTkRqGqlmvC74rwehGjwlcZ_jrmK4I5Ly92m8G5-zAEaICGOpNfZLCN3zRNA5rk_AnXu4O9-62uh_-alISp2tD2PGMhJI1hxHvP2YLRefrd9w7bRvtYIhG0SB47T0P5tNo7VpIXTOZadGOIFAFOfSsUuaBeAHtXLuPdOJ9fyjgGfiLeM9dvWLk6_cfOv-yW7euw_BqwTs3G1YKEWWRzi4xf_VFTQU0YNODdztCexPSuEE-vB1Qf3u-rhyQkGRkwFD47lsE3dW4Re7ZJp97zJHNvP_YOblaPf1qnE8HRuYfy-QbNeOk3bcrr814qvn9jovCi-T4IueT9UHarm_x69hMqI6IyxH5LwvSoeujAeVjWafQGDHMyV8E3iUlkB6meG_xgU8EknXb9L-9c-QUnz0UyxIC6nh0HVleK1Cm_8vE6Hzf0AYpEDACQLRFXyIF2zN7JUnKh_ET3asxMDXiUr1cf2hMEnlZaSE1vz38MeGZssY29Syo68Tb9Ld3RMWDag-rypAhKOdd6c5EsYIbOEfy8ca4lsvwtdMJDsQUpHDom1DvmeGk2FdwenlBY439ahvogMcx6evz5NhBHhKlJIxFfPW_7zm87a8Yxx7ETMiu3MxprsezC_5vXB1JyyLUYeJRSU_9tuICH0850IEX19wnbqZhnEFxmbkychvcAe9rlT7E4UP03isfG7NdmiagnP6ViFsWslSW5FP3NLZfo3cC7gqyBXg7m40ZCn7q0oJ-u4jcV0Fu5r67UgYDMKlf19dbuI-Z6NfyIOMHylEzLlU3bzjMJe8ZlLTTOYzVOTwcReS3heB87fa-IzURM7nAFVX6EtQYGF_6_HISdmVV2bITe_BPHO0gCuHw4gEqKAOkCP9Tj9fsopfgFEsDyqao9PpvL2aiSMoXYjx55lVbgG563nRIf-Ra6qLDOfJqup0H6V7nowIr-jWRHh4I4mW-Boa83-s53UW6O9mTbpQUvuNRQGhRC8XMW_heQ2vP6h2gxgib68pmBIWaz0e08FYdoh2jzwKgVqA9yD80BdkeCRdFsdQil57zlOjGbBf6Rr0nSScIk5s5uscR78yFtbVbup2Myy33e28btSl_vLgvlJnmOUijTfFL1EDyhbmfwZ6ibaRw"> 
    </form> 
   </div> 
  </div> 
  <script>
    (function(){
        window._cf_chl_opt={
            cvId: '2',
            cZone: 'kat.rip',
            cType: 'non-interactive',
            cNounce: '20854',
            cRay: '7dc6341f3db0b014',
            cHash: 'c8717f3147e850a',
            cUPMDTk: "\/search\/fi?__cf_chl_tk=iPfHMW7cMAPN6Lk0OU_712BHqug72vZcm8Yvry5l7ZA-1687622635-0-gaNycGzNCRA",
            cFPWv: 'g',
            cTTimeMs: '1000',
            cMTimeMs: '60000',
            cTplV: 5,
            cTplB: 'cf',
            cK: "",
            cRq: {
                ru: 'aHR0cHM6Ly9rYXQucmlwL3NlYXJjaC9maQ==',
                ra: 'QWxmcmVkS2F0ICh1bmtub3duIHZlcnNpb24pIENGTmV0d29yay8xNDA4LjAuNCBEYXJ3aW4vMjIuNS4w',
                rm: 'R0VU',
                d: 'ahR/OuoHoEeoOa9K+m+m8YMcZgtRESJG+2dk90pJ8CXMVU45T3ayzIn9beuf22UpgAL7pMHNamFutP58wHjw/8PEr1sqGnMMQZNVEoXchMG987nOfGCElkDPeyzQNGGBE1wRA595qEIxqRFZzgHAMuxZNgGYnjLZa/4iXU3sER76/Svl980JH442He/0jtE9UNeOjRpv6WKeiV1PofV0UwznN/Qdhn3riNFoThhbnmt4PwtLdBXWcM//ZybyUEeCcdsNBVLaqIj0cQRtIozydTLiGo9EbluIflqTacmTe5RJrMfOFOH8TVKzIcZOouIqN/kn78/LYGDbVEbKgFZklLSsDjp3L0LiIhdD6ulNm5RbawY4p9r/FKxJ3PVhTZKtf6/rZ9vjtjzCVAj5pveVvO8lGVHOhqIw+cAoxON7id+jjXyeqQZvnoYBMCRTy/bBe1zGLSjK4Lzk5Hk2aIUi9jhfQ4aZj2sHb9ZnUIN49d49lb/qR9aTgrxpixShUAFAf/BaaVm25cWzSJ8IMiN7gmjtTjwhE7UjusVZITRUabLxYzAdv6DE6dVczdaWzz9Fy/NtEyBSBSnyYZCb1keGwQ==',
                t: 'MTY4NzYyMjYzNS4zOTcwMDA=',
                cT: Math.floor(Date.now() / 1000),
                m: 'SuY1iIEPPVHldMHehfmkbdbM+C8rlmZMl9lcragIXW0=',
                i1: 'CJ9ZBW58qMbcBmnqI3+oPg==',
                i2: '+TboreYAXWQ0OaFpTNTn+w==',
                zh: 'bxKa0UzoQT3/V39+OpEhLo9YFmKW3Whb/6+oKJucKUE=',
                uh: 'zafXVx2pSaQxZDAgCnEfhnt3eYGpDp3x61N+YAmdSTI=',
                hh: 'qgmSarje3B25agLuae7AnlqaImmXwkYGLU3MPAH2IJc=',
            }
        };
        var trkjs = document.createElement('img');
        trkjs.setAttribute('src', '/cdn-cgi/images/trace/jsch/js/transparent.gif?ray=7dc6341f3db0b014');
        trkjs.setAttribute('alt', '');
        trkjs.setAttribute('style', 'display: none');
        document.body.appendChild(trkjs);
        var cpo = document.createElement('script');
        cpo.src = '/cdn-cgi/challenge-platform/h/g/orchestrate/jsch/v1?ray=7dc6341f3db0b014';
        window._cf_chl_opt.cOgUHash = location.hash === '' && location.href.indexOf('#') !== -1 ? '#' : location.hash;
        window._cf_chl_opt.cOgUQuery = location.search === '' && location.href.slice(0, location.href.length - window._cf_chl_opt.cOgUHash.length).indexOf('?') !== -1 ? '?' : location.search;
        if (window.history && window.history.replaceState) {
            var ogU = location.pathname + window._cf_chl_opt.cOgUQuery + window._cf_chl_opt.cOgUHash;
            history.replaceState(null, null, "\/search\/fi?__cf_chl_rt_tk=iPfHMW7cMAPN6Lk0OU_712BHqug72vZcm8Yvry5l7ZA-1687622635-0-gaNycGzNCRA" + window._cf_chl_opt.cOgUHash);
            cpo.onload = function() {
                history.replaceState(null, null, ogU);
            };
        }
        document.getElementsByTagName('head')[0].appendChild(cpo);
    }());
</script>   
 </body>
</html>

then some JS is doing the bullshit of loading the real page. not worth working on that sorry.

yourmomdatestedcruz commented 1 year ago

Ah I see - fair enough, thanks for looking into it!

Do you have a list of known working sites somewhere?

godbout commented 1 year ago

Do you have a list of known working sites somewhere?

honestly no. i've built that variable for myself at first, coz sometimes the official KAT site was getting raped. so i wanted to be able to switch the URL fast. but yeah, hit and miss. i haven't had to change it for ages tho. any reason why you don't wanna use the original URL?

yourmomdatestedcruz commented 1 year ago

I travel a fair bit and the default URL is sometimes blocked - I can VPN through and take all those steps but it'd be easier to just switch it out for the duration of the stay.

Additionally, I get more results on kat.rip - by some significant margin, and more trackers too it seems? that last part is strange to me, but there are more results that's for sure.

godbout commented 1 year ago

I travel a fair bit and the default URL is sometimes blocked - I can VPN through and take all those steps but it'd be easier to just switch it out for the duration of the stay.

hmm. fair. although i have another Workflow to connect to a Brook VPN through Alfred 😂️

Additionally, I get more results on kat.rip - by some significant margin, and more trackers too it seems? that last part is strange to me, but there are more results that's for sure.

oh. strange. i would have expected the mirrors to have the same data. i've mostly always been able to find what i was looking for on the original. but yeah, i surely can't do anything easily with kat.rip. not sure if there's a way to get scrappers to grabbed some kind of final render of the HTML. that'd require me to dig deep into how that works. which i'm def. not gonna do for now coz i'm busy with more important stuff, sorry!

yourmomdatestedcruz commented 1 year ago

not sure if there's a way to get scrappers to grabbed some kind of final render of the HTML

surely there is, but it's above what I can understand - I smell a custom search to just get that results URL a bit quicker.

which i'm def. not gonna do for now coz i'm busy with more important stuff, sorry!

no worries - thanks for looking into it as much as you did!

godbout commented 1 year ago

surely there is, but it's above what I can understand - I smell a custom search to just get that results URL a bit quicker.

yeah? like i'd be surprised. their shits are being rendered dynamically. or at least after some maybe Cloudflare bs or whatever. anyways haven't been digging too much, but it's enough for me to know that's gonna be a headache and i don't have the current knowledge needed to think of a solution quickly 😂️

no worries - thanks for looking into it as much as you did!

sure.