dhimmel / hackjohn

Bot to monitor for southbound permit spaces on the John Muir Trail
https://hive.blog/@dhimmel/introducing-the-hackjohn-bot-for-southbound-john-muir-trail-permits
MIT License
25 stars 18 forks source link

New Yosemite Planning Your Wilderness Permit page prevents automated access to the trailhead report #9

Closed dhimmel closed 3 years ago

dhimmel commented 4 years ago

Currently, the trailhead report webpage that hackjohn uses at https://www.nps.gov/yose/planyourvisit/fulltrailheads.htm shows the following:

image

The "improved version" at https://yosemite.org/planning-your-wilderness-permit/ places the trailhead report behind a capcha:

image

So it looks like users will no longer be able to perform automated checks of the trailhead report, like Hackjohn used to enable. This is a bit of a bummer, but perhaps Hackjohn was a victim of its own success!

If anyone is aware of a workaround or finds the trailhead available from a public API endpoint, let us know below and Hackjohn might be able to resurrect itself.

dhimmel commented 4 years ago

There is a potential API call that returns the results at https://yosemite.org/wp-content/plugins/wildtrails/query.php?resource=report&region=ww.

In an authenticated browser, it returns JSON like::

{"status":{"type":"message","value":"report found."},
"response":{"id":"ww","values":[
  {"date":"2020-06-29","w03a":7,"w03b":10,"w03c":0,"w05":4,"w30":2,"w31a":11,"w31b":0,"w32":14,"w33":2,"w34":18,"w35":1,"w36":2},
  {"date":"2020-06-30","w03a":10,"w03b":19,"w03c":0,"w05":0,"w30":0,"w31a":6,"w31b":0,"w32":4,"w33":9,"w34":10,"w35":0,"w36":3},
  {"date":"2020-07-01","w03a":10,"w03b":11,"w03c":6,"w05":2,"w30":0,"w31a":0,"w31b":0,"w32":8,"w33":10,"w34":15,"w35":0,"w36":22},
  {"date":"2020-07-02","w03a":8,"w03b":25,"w03c":13,"w05":10,"w30":0,"w31a":9,"w31b":0,"w32":15,"w33":22,"w34":14,"w35":0,"w36":38},

But without authentication (which I assume requires not being a robot), it returns:

{"status":{"type":"error","value":"unauthorized"},"response":null}
SokolskyNikita commented 4 years ago

Couple of solutions:

  1. Integrate with a Recaptcha solving API (requiring people to add money to use the script): https://anti-captcha.com/
  2. Ask users to re-authenticate periodically to gain the cookie value
dhimmel commented 4 years ago

Ask users to re-authenticate periodically to gain the cookie value

Interesting idea. I wonder how long the authentication cookie works for. Unless it lasts for a span of days, this would add little benefit... At that point, you might as well just look at the table manually.

apanagar commented 4 years ago

There is a potential API call that returns the results at https://yosemite.org/wp-content/plugins/wildtrails/query.php?resource=report&region=ww.

In an authenticated browser, it returns JSON like::

{"status":{"type":"message","value":"report found."},
"response":{"id":"ww","values":[
  {"date":"2020-06-29","w03a":7,"w03b":10,"w03c":0,"w05":4,"w30":2,"w31a":11,"w31b":0,"w32":14,"w33":2,"w34":18,"w35":1,"w36":2},
  {"date":"2020-06-30","w03a":10,"w03b":19,"w03c":0,"w05":0,"w30":0,"w31a":6,"w31b":0,"w32":4,"w33":9,"w34":10,"w35":0,"w36":3},
  {"date":"2020-07-01","w03a":10,"w03b":11,"w03c":6,"w05":2,"w30":0,"w31a":0,"w31b":0,"w32":8,"w33":10,"w34":15,"w35":0,"w36":22},
  {"date":"2020-07-02","w03a":8,"w03b":25,"w03c":13,"w05":10,"w30":0,"w31a":9,"w31b":0,"w32":15,"w33":22,"w34":14,"w35":0,"w36":38},

But without authentication (which I assume requires not being a robot), it returns:

{"status":{"type":"error","value":"unauthorized"},"response":null}

Hi - I'm interested in seeing if I can help with this, but I don't understand what you mean by 'authenticated browser'? I don't see any sort of account creation mechanism or anything of the sort.

dhimmel commented 4 years ago

I don't understand what you mean by 'authenticated browser'

By authenticated I meant having competed the "Im not a robot" captcha.

apanagar commented 4 years ago

Thats what I thought, but I couldn't view your link even after doing the robot verification. I can't seem to find any trailhead report on the site. Was it removed from the site?

dhimmel commented 4 years ago

Go to https://yosemite.org/planning-your-wilderness-permit/, do the captcha, and then select trail and entry point options.

image

At this point a table shows up for me:

image

apanagar commented 4 years ago

Thanks for the screenshots. Got it to work, looks like all my ad blockers were preventing the site from loading the trailhead report

apanagar commented 4 years ago

How would I read this?

image

August 19th is obviously the lottery quota available 15 days out. But what about the August 13 and August 24 dates? Are the dates before 15 days out still lottery? And What do the numbers after 15 days out mean?

dhimmel commented 4 years ago

How would I read this?

The lotteries for these days have already occurred. The lottery is drawn 168 days before the date. The spaces you see in your screenshot are vacancies after the lottery. This occurs if not enough people apply for a given date in the lottery, or more common those who won the lottery end up canceling.

In future, please open a new issue for questions that are not directly related to the topic.

dmca-faire commented 4 years ago

Why not build it paid captcha solving and allow users to provide their credentials? If you're checking every 5 minutes it would cost around $0.20 per day for someone to run.

https://anti-captcha.com/mainpage

apanagar commented 4 years ago

@dmca-faire It's a bit more involved than that. The JSON data in the API calls above aren't raw data, or at least I was unable to correlate it was the data being shown on the site. Something on the frontend seems to be interpreting it for rendering (some extra protection). One would also have to automate the page, navigate the dropdowns, etc and scrape off all the data in addition to solving the captcha with a service. It'd be a bit more fragile if the page design changes.

bradtgmurray commented 4 years ago

Just did a bit of looking...

https://yosemite.org/wp-content/plugins/wildtrails/query.php?resource=report&region=jm is the API call in question, the last region parameter is which radio button is selected. The result has an array of values, one for each date, in some kind of obsofucated format.

{
  date: "2020-08-17",
  j01a: 9,
  j01b: 30,
  j03a: 9,
  j19: 15,
  j24b: 35,
  d01: 20,
  d02: 18
},

The source for this processing script is here: https://yosemite.org/wp-content/plugins/wildtrails/scripts/ui.js?ver=5.4.2

This API call is made in the source from a function called queryReport_, which then calls processReport_. This updates the trailhead options, and, if you have a trailhead selected, calls updateTrailheadInfo which calls appendReportList, which actually creates the #wt-report-list element that has the dates that you see on the site.

From looking through that code, it looks like the j01a keys are for the different trailheads that are available for a given region. There's a second API call https://yosemite.org/wp-content/plugins/wildtrails/query.php?resource=trailheads that gets the trailhead data and j01a looks like this:

j01a: {
id: "j01a",
name: "Happy Isles to Sunrise/Merced Lake Pass Thru",
wpsName: "Happy Isles->Sunrise/Merced Lake (pass through)",
region: "jm",
latitude: null,
longitude: null,
description: "If you do not plan on exiting the park via Donohue Pass, please <span><a href="?region=yv&th=y01a">click here</a>.</span>",
quota: 6,
capacity: 10,
alert: null,
notes: "<li>You must camp beyond Little Yosemite Valley and Moraine Dome.</li><li>Bears have obtained food from backpackers in this area. There are bear lockers at Merced Lake backpackers camp.</li>"
}

Note this trailhead info also includes capacity and quota values. Looks like the entering the John Muir trail through the Happy Isles to Sunrise/Merced Lake Pass Thru allows for 10 people a day to enter.

Looking at the code, there's some kind of modifier on the capacity by a lottery value, which I don't quite understand. Further complicating the code it looks like there's a special case for the jm region and the j24b trailhead which I don't quite understand yet.

In short, the original region API gives you how many spots have been claimed by date and by trailhead. The trailhead API gives you the quota+capacity values for that trailhead. The below calculation (once unwound) will give you the available value, which is the number of available spots on a given day.

                var inverted = (dt <= invDays + 1 && invStart != null & date >= invStart);
                var c = inverted ? capacity : quota;
                var available = 0;

                var lottery = (inverted && invDays > 0 && invDays <= dt && dt <= invDays + 1);
                lottery = lottery || (maxReserveDays <= dt && dt < maxReserveDays + 1);

                if (dt < minReserveDays || dt > maxReserveDays + 1) {
                    available = 0;
                }
                else if (found) {
                    var occupancy = data['values'][i][th];

                    if (occupancy != null) {
                        available = c - Math.min(occupancy, lottery ? quota : c);
                    }
                }

                if (dh != null) {
                    var dc = inverted ? dCap : dQuota;

                    if (dt < minReserveDays || dt > maxReserveDays + 1) {
                        dAvailable = 0;
                    }
                    else if (found) {
                        var dOccupancy = data['values'][i][dh];

                        if (occupancy != null) {
                            dAvailable = dc - Math.min(dOccupancy, lottery ? dQuota : dc);
                        }
                    }

                    available = Math.min(available, dAvailable);
                }

Where invStart and invDays are derived from this resParams variable, where I'm not sure where that comes from.

"{
    "inversion_start": "2020-06-26",
    "inversion_days": 14,
    "max_reserve_days": 168,
    "min_reserve_days": 8,
    "base_url": "https://yosemite.org/wp-content/plugins/wildtrails/",
    "bg_image": "images/redpeak-sunrise.jpg",
    "contact_form_path": "/contact-us-wilderness/",
    "timezoneOffset": -25200,
    "monthNames": [
        "January",
        "February",
        "March",
        "April",
        "May",
        "June",
        "July",
        "August",
        "September",
        "October",
        "November",
        "December"
    ],
    "dayNames": [
        "Sun",
        "Mon",
        "Tue",
        "Wed",
        "Thu",
        "Fri",
        "Sat"
    ]
}"

Something about these "inversion" days makes it so it uses the capacity instead of the quota?

Anywho, I think we're close and this looks doable.

dhimmel commented 4 years ago

@bradtgmurray impressive digging. Since I've already done the hike and I'm a bit short on time these days, I'm not in a position to take the lead here. The reCAPTCHA combined with the new API make the scope of the project considerably greater.

So feel free to take the lead. You could even make a business that provides users alerts over vacancies. How quickly they are notified of a vacancy could depend on their service tier. Just an idea...

djcunningham0 commented 3 years ago

The logic seems pretty simple to recreate. My problem is I can't seem to figure out how the anti-captcha service to solve the captcha and access the API. Can anyone help me with that part?


Here's the core logic. For each date, the API that @bradtgmurray mentioned returns the number of permits that have already been reserved for each date. The entry for each date looks something like this:

{
    "date": "2021-08-06",
    "j01a": 5,
    "j01b": 18,
    "j03a": 6,
    "j19": 9,
    "j24b": 18,
    "d01": 20,
    "d02": 12
},

The "jxxx" keys represent trailheads and the values represent the number of permits that have been reserved for that date and trailhead. In this example, 5 permits have been taken from trailhead "j01a". We can use the trailhead API to see that j01a is the Happy Isles to Sunrise/Merced Lake trailhead, which has a quota of 6 permits.

So there are 6 permits available and only 5 taken... BUT we have to take the Donohue Pass exit quota into account. The "d01" and "d02" values show the number of permits reserved that allow for an exit over Donohue Pass. The quotas for these are 20 and 15, respectively (found in the trailhead API). The first four trailheads (j01a, j01b, j03a, j19) use the d01 quota and the other one (j24b) uses the d02 quota. Since the d01 quota of 20 has already been met, there are 0 available permits for the j01a trailhead.

The j24b trailhead (Lyell Canyon) has a quota of 21. Since we are below the quota for the trailhead (18 < 21) AND for the d02 Donohue Pass quota (12 < 15), there are 3 permits available for the j24b trailhead.

To summarize, once you get the data from the API the calculation boils down to this:

  1. Calculate the available permits using the trailhead quota: trailhead_quota - trailhead_value
  2. Calculate the available Donohue Pass exit permits using the d01 or d02 depending on your trailhead: d0x_quota - d0x_value
  3. Take the minimum of 1 and 2.

Some additional details, which may or may not be necessary to know:

apanagar commented 3 years ago

@djcunningham0 Nice summary. I had an airflow job crawling this data a couple times a day a few months back - took it down though after covid because the whole permit situation changed and was breaking the crawler (and I didn't have the time to fix it). I was using this with 2captcha to bypass the captcha https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-recaptcha

djcunningham0 commented 3 years ago

I figured out how to get it working using just the python requests library and a captcha solving service. I'll push changes to my fork in a few days and then submit a PR.

As for the "inversion days" mentioned a few times above, this reddit thread seems to confirm that it was a COVID policy. Last year there was an additional lottery starting 14 days before and ending 8 days before the hike date (that's why inversion_days was 14 and min_reserve_days was 8 at the time of bradtgmuray's post). That policy isn't in effect now, and that's why inversion_days is now 0 and min_reserve_days is now 2.

dhimmel commented 3 years ago

I'll push changes to my fork in a few days and then submit a PR.

Would be greatly appreciated!