Closed mikehardy closed 6 years ago
Ok, I will do it. Is "[sound]" the only tag that should be added? I looked through the discussions about this issue and it seems like my addon also needs to unescape HTML in the URL before processing it. I did a test and it looks like urls it gets from cards are HTML escaped. Thanks for pointing this out.
That's cool, great! I think just images and sound as media. The urls may also be unescaped as ankidroid does not post process edits in the fields
There is a bug in Anki that stops me from implementing this. Anki doesn't play an audio file if it has "&" (and may be other special symbols) in its filename. Reproduction Steps:
Expected result: Anki plays sound Actual result: Anki doesn't play sound
Media Internalizer relies on Anki method MediaManager.writeData for saving media internally. writeData returns a filename under which the file was stored. For the URL http://dict.youdao.com/dictvoice?audio=smoothly&type=1 it creates a file "dictvoiceaudio=smoothly&type=1". The addon replaces "[sound:http://dict.youdao.com/dictvoice?audio=smoothly&type=1]" by "[sound:dictvoiceaudio=smoothly&type=1]" that cannot be played because of this bug.
Man, that's irritating - I see what you mean now - my own personal edit of the regex had internalized the file but now I see that I didn't test enough, it's got the same problem you describe (doesn't play) and it's because of this same HTML-encoding issue (it's a '\&' in the raw HTML if you look)
The only thing I can think of is to have media-internalizer call a URL-encode function prior to asking Anki to writeData - it looks to me like URL-encoded URLs (at least this one) survive a secondary HTML-encoding without change, so this may work?
BTW - this is specifically NOT considered a bug in Anki, because the "field" in a "note" is an internal portion of an HTML document, they actually are required to HTML-encode fields and things that will go in fields. That was what I gathered from the bug I linked as "the related AnkiDesktop issue" above. While irritating to me only from the perspective of wanting it to work easily, I think their stance on the issue - that they must HTML-encode - is correct, thus the suggestion of pre-URL-encoding in order to generate filenames that can survive HTML-encoding
I did it. The addon just strip off any query string in a filename. So, for http://dict.youdao.com/dictvoice?audio=smoothly&type=1 it would be just "dictvoice".
I think that is a bug. "&" is not a reserved character in Windows and Unix filesystems, so it can be used in filenames. User can place such file onto a card by using standard Anki attach media mechanism. But after that it's broken. Probably, Anki should html-decode a file path before playing it.
I created pull request https://github.com/dae/anki/pull/218 that fixes this bug.
AnkiDroid had a user file an issue - ankidroid/Anki-Android/issues/4741 - where an URL like [sound:http://blah.blah.com/smoothly.mp3?type=1] didn't sync correctly, and the root cause was that AnkiDesktop HTML-encoded the URL (instead of URL-encoding it).
However it's possible to take these notes synced from AnkiDroid and run the media internalizer on them, but only if the regex matches them.
When changed to [sound:(http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+)[^>]] (apologies if MarkDown mangles that - but you can see it at the end of the related AnkiDesktop issue also media internalizer worked just fine
It seems that media internalizer could be even more useful than it is if it handled more classes of URLs such as this sound: one?