codeaudit / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

Insufficient template cleaning #91

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The current version of io.jwpl.WikipediaRevisionReader uses WikiUtils.cleanText 
for its plain text output. However, templates can also be of the form 
{{template}}, which is not covered by this method. See 
http://en.wikipedia.org/wiki/Template:Advert for an example.

Please fix this ASAP, as this is a blocker for our project.

Original issue reported on code.google.com by kutschke...@googlemail.com on 6 Aug 2012 at 8:23

GoogleCodeExporter commented 9 years ago
Is this what you need?

plainText = plainText.replaceAll("\\{\\{.+?\\}\\}", " ");

Original comment by richard.eckart on 6 Aug 2012 at 5:52

GoogleCodeExporter commented 9 years ago
I think so, yes.

Original comment by kutschke...@googlemail.com on 7 Aug 2012 at 7:05

GoogleCodeExporter commented 9 years ago
I have added the pattern. Please check if that resolves your problem.

If you have such problems, I recommend you check out DKPro Core ASL and make 
the changes yourself to see if it works. Then you can provide a patch or at 
least solid experience whether it works or not.

Original comment by richard.eckart on 7 Aug 2012 at 9:41

GoogleCodeExporter commented 9 years ago
So, did this help anything?

Original comment by richard.eckart on 18 Aug 2012 at 4:30

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 18 Aug 2012 at 4:31

GoogleCodeExporter commented 9 years ago
Yes, thank you, now it works.

Original comment by kutschke...@googlemail.com on 18 Aug 2012 at 4:51

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 18 Aug 2012 at 4:54