Khan / khan-api

Documentation for (and examples of) using the Khan Academy API
http://www.khanacademy.org
377 stars 75 forks source link

Hindi videos in topic tree even when using lang=en-US #136

Open HosseinAgha opened 5 years ago

HosseinAgha commented 5 years ago

Take a look at this video: https://www.khanacademy.org/api/v1/videos/D3N7Yd8aHvM
It looks like an English video but both translated-youtube-url and youtube-url redirect to a hindi video.
The only thing to distinguish it is the title.
even source_language is English.
Now I'm worried that api may return some videos in other languages and mark them as English.

kdadmin commented 5 years ago

I'm not familiar with the JSON for translated videos, but:

D3N7Yd8aHvM.mp4 is a hindi video. MamrTJ7V_Vg.mp4 is the english version.

A call to: https://www.khanacademy.org/api/v1/videos/MamrTJ7V_Vg returns data on the english version.

Where did the D3N7Yd8aHvM ID come from?

Gary

HosseinAgha commented 5 years ago

I found them using the topictree api

danielhollas commented 5 years ago

I've noticed that the /topictree&kind=video endpoint sometimes contains faulty data for non-EN LTT. Specifically, I noticed that sometimes translated_youtube_id == youtube_id, even though given video was dubbed/recreated. Not sure whether this is related to this ticket.

It is safer to use the /video/ endpoint for each specific video.

danielhollas commented 5 years ago

@HosseinAgha I've just realized what the problem is here. For certain technical reasons, the English TT indeed contains certain content in Hindi. This is unfortunate, but you can workaround this issue by filtering out certain courses. For example, look where the video you mentioned is located:

https://www.khanacademy.org/math/in-in-class-8-math-india-hindi/in-in-8-mensuration-hindi/area-of-trapezoids-composite-figures-hindi/v/finding-area-by-rearranging-parts-hindi

I think this is expected behavior and this ticket can be closed. I should open a separate ticket for the issue in my last comment. One thing that could be improved is the "source_language" field, although I am not sure where it is coming from.