UCL-CREST / Siamese

Siamese: a scalable code clone search engine
GNU General Public License v3.0
35 stars 22 forks source link

Fix the lexer problem #2

Open cragkhit opened 6 years ago

cragkhit commented 6 years ago
line 221:9 token recognition error at: '"Vsi na na trg. Crni panter po novem 0,67� \n\n'
line 222:38 token recognition error at: '";\n'
line 221:9 token recognition error at: '"Vsi na na trg. Crni panter po novem 0,67� \n\n'
line 222:38 token recognition error at: '";\n'
line 20:8 token recognition error at: '#'
line 20:9 token recognition error at: '#'
line 20:10 token recognition error at: '#'
line 20:23 token recognition error at: '#'
line 20:24 token recognition error at: '#'
line 20:25 token recognition error at: '#'
line 20:8 token recognition error at: '#'
line 20:9 token recognition error at: '#'
line 20:10 token recognition error at: '#'
line 20:23 token recognition error at: '#'
cragkhit commented 6 years ago

I tried this file default/15408.java and it has this special characters (€) in it. What's the right encoding to use here?

    // miha g.

    public String resnica() {
        return "Vsi na na trg. Crni panter po novem 0,67€ \n
        Samo se Jack Bauer nam lahko pomaga!";
    }

    public void krnekej() {
        System.out.println("Ko bom velik, bom pilot. Mogoce kopilot.");
    }

    public static void main(String[] args) {
        Kalkulator2 kalkulator2 = new Kalkulator2();
        kalkulator2.setVisible(true);
    }
cragkhit commented 6 years ago

It's stored in ES index like this:

// miha g. 
public String resnica ( ) { return Samo se Jack Bauer nam lahko pomaga ! } 
public void krnekej ( ) { System . out . println ( \"Ko bom velik, bom pilot. Mogoce kopilot.\" ) ; } public static void main ( String [ ] args ) { Kalkulator2 kalkulator2 = new Kalkulator2 ( ) ; kalkulator2 . setVisible ( true ) ; }