Khan / khan-api

Documentation for (and examples of) using the Khan Academy API
http://www.khanacademy.org
377 stars 75 forks source link

topictree incomplete? #111

Closed MaxDz closed 8 years ago

MaxDz commented 8 years ago

As of the last few days, the topictree (from http://www.khanacademy.org/api/v1/topictree?kind=Exercise) is not returning all the hierarchy: some domains are missing, like Arts/Hum, Econ, Partner).

Is this a recent bug, or is KA reorganizing the TT?

Also http://www.khanacademy.org/api/v2/topics/topictree returns 500: Error: Server Error / The server encountered an error and could not complete your request. ... which is the only way to get the article/video info.

kdadmin commented 8 years ago

…and the full topictree (api/v1/topictree) returns only 26 of 46 mb, with some of it in a foreign language

benjaminjkraft commented 8 years ago

We're aware of some issues with the v2 topictree, which we're working on mitigating. I'm not sure what would have caused the changes in the v1 topictree; there are some backend changes going on but I don't think they are intended to cause any frontend changes (cc @WChargin). I'll follow up internally and see what's going on.

wchargin commented 8 years ago

Correct; the v2 topictree may be slower or fail to complete. Sorry about that. It's hurting us, too, so we're trying to speed it up. (As we create more and more content, we'll probably have to redesign the API significantly so that you can page through it instead of grabbing all the data at once.)

I'm also not aware of any recent changes to the v1 topictree endpoint, nor do I know why some of it would be in a different language.

benjaminjkraft commented 8 years ago

@kdadmin having some of it in a different language is certainly strange; can you post the exact call you're making (including the request headers)? Is this happening consistently, or just some of the time? And what do you mean when you say it's only returning some of the content?

MaxDz commented 8 years ago

In domain Math, there are now other subjects that appear with titles in French mixed with English, while other domains are skipped altogether. This happens with the standard query kind=Ex.

The v2 fails immediately with 500 error, not timing out.

@WChargin Grabbing either url was somewhat slow, but manageable. For our translation-control tool, we are only updating every few days.

benjaminjkraft commented 8 years ago

@MaxDz can you also post the exact requests you're making?

MaxDz commented 8 years ago

@benjaminjkraft To get the tt: http://www.khanacademy.org/api/v1/topictree?kind=Exercise To get the article,video info: http://www.khanacademy.org/api/v2/topics/topictree

benjaminjkraft commented 8 years ago

Can you post the full HTTP request headers? That may help us identify any caching or language issues.

MaxDz commented 8 years ago

We just invoke those urls with curl_exec to get the jsons.

Here is a fragment of the current topictree (v1 api), grabbed just now: 2 x96e1c864 T: CP fr-first-grade-math c=2 3 xf50e0840 T: Numération et dénombrement numeration-denombrement c=3 4 xff87f154 T: Savoir compter savoir-compter c=4 5 xa4413411 E: Numbers to 100 count-to-100 5 x91a89d47 E: Find 1 more or 1 less than a number one-more--one-less 5 x4debd8a3 E: Count with small numbers counting-out-1-20-objects 5 x3b1bceb9 E: Missing numbers count-from-any-number 5/ V:xb19b2406 and it has only 6 domains and 49 subjects; should be 11 domains and ~99 subjects.

benjaminjkraft commented 8 years ago

It would be really helpful if you can get the full request body with the headers -- looks like you can set some curl_exec options per here: http://stackoverflow.com/a/8483805/904174. Thanks!

MaxDz commented 8 years ago
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $KAurl);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$str = curl_exec($ch);

but you can see same thing happens if you just invoke the url from a browser...

kdadmin commented 8 years ago

For the full topictree, a get request to:

www.khanacademy.org/api/v1/topictree

via a browser returns 24.22 MB of json with a total of 1736 topics under these 6 parent nodes.

Computing KA Educator Math New and noteworthy Science Test prep

A few weeks back some 46 MB was returned containing a total of about 2633 topics under these 11 parent nodes.

Arts and humanities College admissions Computing Economics and finance KA Educator Math New and noteworthy Partner content Science Talks and interviews Test prep

Some topis appear to be in French, such as: Math/CP Math/CE1 Math CE2 (and 7 other Maths) Test Prep/Troisime Test Prep/Termanale

Note I only count topics containing videos, and I am not familiar with a v2/topictree.

benjaminjkraft commented 8 years ago

Thanks for the additional detail -- I've reported this internally and someone will look into it.

csilvers commented 8 years ago

I'm an engineer at Khan Academy looking into this issue. I found a problem with how we share caches across languages that could account for the issues you were seeing. We should have a fix out sometime this week.

csilvers commented 8 years ago

OK, the fix it out! We're back to 48M of data for the topictree url, and it includes economics again. :-)

MaxDz commented 8 years ago

Thanks, Craig!

But the v2 is still returning 500 error: http://www.khanacademy.org/api/v2/topics/topictree

csilvers commented 8 years ago

Ah, can you open a separate issue for that?