ankidroid / Anki-Android

AnkiDroid: Anki flashcards on Android. Your secret trick to achieve superhuman information retention.
GNU General Public License v3.0
8.7k stars 2.24k forks source link

Images tags with attributes dropped in conditional templates #5166

Closed qsantos closed 4 years ago

qsantos commented 5 years ago
Reproduction Steps
  1. create a note with a field Illustration containing <img src="bla.png" style="">
  2. create a card with a template condition on this tempalte, e.g. {{#Illustration}}<hr>{{Illustration}}{{/Illustration}
  3. revise the card
Expected Result

See the image from the Illustration field (below a horizontal ruler).

Actual Result

No image, no horizontal ruler.

Debug info

-

Research

[ ✓ ] I have read the support page and am reporting a bug or enhancement request specific to AnkiDroid [ ✓ ] I have checked the manual and the FAQ and could not find a solution to my issue [ ✓ ] I have searched for similar existing issues here and on the user forum

Analysis

It seems to be caused by line 101 of AnkiDroid/src/main/java/com/ichi2/libanki/Utils.java :

    private static final Pattern imgPattern = Pattern.compile("<img src=[\\\"']?([^\\\"'>]+)[\\\"']? ?/?>");

As an aside, note that this will also fail with filenames containing single or double-quotes. The regex imgPattern is then used by stripHTMLMedia from line 230

    public static String stripHTMLMedia(String s) {
        Matcher imgMatcher = imgPattern.matcher(s);
        return stripHTML(imgMatcher.replaceAll(" $1 "));
    }

Note that this regex is also present in api/src/main/java/com/ichi2/anki/api/Utils.java. This method stripHTMLMedia() is used in the template engine to determine whether a field is empty or not. However, looking at AnkiDroid/src/main/java/com/ichi2/libanki/template/Template.java line 140:

    private String render_sections(String template, Map<String, String> context) {
        while (true) {
            Matcher match = sSection_re.matcher(template);
            if (!match.find()) {
                break;
            }

            String section = match.group(0);
            String section_name = match.group(1);
            String inner = match.group(2);
            section_name = section_name.trim();
            String it;

            // check for cloze
            Matcher m = fClozeSection.matcher(section_name);
            if (m.find()) {
                // ...
            } else {
                it = get_or_attr(context, section_name, null);
            }
            String replacer = "";
            if (!TextUtils.isEmpty(it)) {
                it = Utils.stripHTMLMedia(it).trim();  // here
            }
Proposed Solution

Make the regex more permissive by using the same value as PATH_REGEX from AnkiDroid/src/main/java/com/ichi2/anki/multimediacard/fields/ImageField.java line 37.

mikehardy commented 5 years ago

This is fantastic analysis - it should save me a bunch of time, thank you!

Only because this is so thorough, I'm curious if you also looked at how Anki Desktop handles the same item? Sometimes the code that should handle things the same way drifts apart over time (as desktop adds features or fixes issues and we haven't caught up), and they may have a different regex (or different method entirely) at this point. It's browsable at https://github.com/dae/anki

I'll look at incorporating a change for this either way but if you checked desktop handling it would maybe improve any fix here

qsantos commented 5 years ago

Considering how much I have been using AnkiDroid, it's the least I could do!

I had not realized that the code of libanki was so similar to that of Anki. Was it automatically converted from Python to Java? Anyway, here is the corresponding code in anki/template/template.py, line 76 (2.1.2):

    def render_sections(self, template, context):
        """Expands sections."""
        while 1:
            match = self.section_re.search(template)
            if match is None:
                break

            section, section_name, inner = match.group(0, 1, 2)
            section_name = section_name.strip()

            # check for cloze
            val = None
            m = re.match("c[qa]:(\d+):(.+)", section_name)
            if m:
                # get full field text
                txt = get_or_attr(context, m.group(2), None)
                m = re.search(clozeReg%m.group(1), txt)
                if m:
                    val = m.group(1)
            else:
                val = get_or_attr(context, section_name, None)

            replacer = ''
            inverted = section[2] == "^"
            if (val and not inverted) or (not val and inverted):
                replacer = inner

            template = template.replace(section, replacer)

        return template

There is no HTML stripping at this point. Trying to uniformize with Anki would thus imply moving this stripping out of render_sections() (and tracking any code effected by this change). However, stripHTMLMedia also exists in Anki, defined in anki/utils.py at line 143 :

def stripHTMLMedia(s):
    "Strip HTML but keep media filenames"
    s = reMedia.sub(" \\1 ", s)
    return stripHTML(s)

and reMedia at line 133:

reMedia = re.compile("(?i)<img[^>]+src=[\"']?([^\"'>]+)[\"']?[^>]*>")

In this regex [^>]* matches any character until the > (it's roughly equivalent to .*? here). So it might be simpler to just replace ? by [^>]* in the following line:

    private static final Pattern imgPattern = Pattern.compile("<img src=[\\\"']?([^\\\"'>]+)[\\\"']? ?/?>");

By the way, I have just noticed this infinite while loop, and it does not seems necessary. I am not familiar with the Java library but, in the Python's standard library, re.sub accepts a callback, and I expect it to be possible as well in Java. At any rate, this is not quite as important. I should take the time someday to contribute back to the Anki ecosystem (hopefully before the Heat Death of the universe, but I promise nothing).

mikehardy commented 5 years ago

I had not realized that the code of libanki was so similar to that of Anki. Was it automatically converted from Python to Java

I wasn't around at the time, but I believe it was manual - just with an eye towards keeping the fidelity as close to 100% as possible so that 'upstream' changes (even though they are Python) could be easily adopted.

Seems to me the quick fix is just to alter the imgPattern in the way you mention (? to [^>]*), the medium fix is to reference the same pattern in both areas we appear to be doing the same thing (Utils.java and ImageField.java) and the full fix would be to re-harmonize libanki in this area? Does that sound about right?

I should take the time someday to contribute back to the Anki ecosystem (hopefully before the Heat Death of the universe, but I promise nothing).

:-) - I had never done real android development until last year and picked it up to see if I could fix a personal nit with AnkiDroid, I found the codebase and the tools pretty easy to work with. Especially for a fix that might be as fast as the quick fix - but one where you have the test all lined up - you might find it to be a 60 minute or so task, where the majority is just waiting for tools + emulator+ + dependencies installs. Pretty satisfying. Understandable if not interested though - I'll give this a look later after finishing 2.9 blockers

qsantos commented 5 years ago

Seems to me the quick fix is just to alter the imgPattern in the way you mention (? to [^>]*), the medium fix is to reference the same pattern in both areas we appear to be doing the same thing (Utils.java and ImageField.java) and the full fix would be to re-harmonize libanki in this area? Does that sound about right?

That's how I see it.

:-) - I had never done real android development until last year and picked it up to see if I could fix a personal nit with AnkiDroid, I found the codebase and the tools pretty easy to work with. Especially for a fix that might be as fast as the quick fix - but one where you have the test all lined up - you might find it to be a 60 minute or so task, where the majority is just waiting for tools + emulator+ + dependencies installs. Pretty satisfying. Understandable if not interested though - I'll give this a look later after finishing 2.9 blockers

Thanks for the information!

mikehardy commented 5 years ago

Tagged with "good first issue", as the medium fix should be easy to do (the testing will be the longest part I bet) and that would be a successful fix for now