TechnionYP5777 / Bugquery

Bug query
9 stars 1 forks source link

Tags removal in stacktrace extraction #36

Closed tonylekhtman closed 7 years ago

tonylekhtman commented 7 years ago

I checked the StackTraceExtractor with some posts from the db and I noticed it doesn't work well when the post is with html tags like this one: http://stackoverflow.com/questions/3988788/what-is-a-stack-trace-and-how-can-i-use-it-to-debug-my-application-errors

That in the db looks like this:

<p>Sometimes when I run my application it gives me an error that looks like:</p>&#xA;&#xA;<pre><code>Exception in thread "main" java.lang.NullPointerException&#xA; at com.example.myproject.Book.getTitle(Book.java:16)&#xA; at com.example.myproject.Author.getBookTitles(Author.java:25)&#xA; at com.example.myproject.Bootstrap.main(Bootstrap.java:14)&#xA;</code></pre>&#xA;&#xA;<p>People have referred to this as a "stack trace". <strong>What is a stack trace?</strong> What can it tell me about the error that's happening in my program?</p>&#xA;&#xA;<hr/>&#xA;&#xA;<p><em>About this question - quite often I see a question come through where a novice programmer is "getting an error", and they simply paste their stack trace and some random block of code without understanding what the stack trace is or how they can use it. This question is intended as a reference for novice programmers who might need help understanding the value of a stack trace.</em></p>&#xA;

rodedzats commented 7 years ago

For each input I removed the html tags and treated "&#xA;" as \n. I tested the example you posted and it works fine now. closed.

yossigil commented 7 years ago

Check if there is a lib for HTML cleanup. You may need more.

rodedzats commented 7 years ago

@yossigil will do, thanks