HTML tag stopwords - Githubissues

chnm / serendipomatic

http://serendipomatic.org/

26 stars 9 forks source link

HTML tag stopwords #78

Open moltude opened 11 years ago

moltude commented 11 years ago

Site accepts html but only searches for tags. They need to be in the stopwords file or someway of recognizing these tags and ignoring them

rlskoeser commented 11 years ago

better approach: attempt to recognize html/xml and load with something like beautifulsoup to get text-only content

would be interesting to try with ead/tei

moltude commented 11 years ago

Ah ha! Python libraries that do what needs to be done // still learning

mialondon commented 11 years ago

@moltude do you have the test notes from when you and @amrys were trying out her RefWorks (? might have been EndNote etc) to use as a reference for this?

And any TEI examples would be useful too.

amrys commented 11 years ago

I'm attaching the screenshot of what happened when I initially gave the machine my BibTeX library, if that helps.

--a.

On Thu, Oct 17, 2013 at 8:47 PM, Mia notifications@github.com wrote:

@moltude https://github.com/moltude do you have the test notes from when you and @amrys https://github.com/amrys were trying out her RefWorks (? might have been EndNote etc) to use as a reference for this?

And any TEI examples would be useful too.

— Reply to this email directly or view it on GitHubhttps://github.com/chnm/serendipomatic/issues/78#issuecomment-26565010 .

rlskoeser commented 11 years ago

@amrys I'm not seeing a screenshot. You might have to use the GitHub web interface (not sure if you can add attachments via email).

amrys commented 11 years ago

Roger that. I will try to remember my GitHub login after I take care of lecture tomorrow.

On Wed, Oct 23, 2013 at 5:11 PM, Rebecca Sutton Koeser < notifications@github.com> wrote:

@amrys https://github.com/amrys I'm not seeing a screenshot. You might have to use the GitHub web interface (not sure if you can add attachments via email).

— Reply to this email directly or view it on GitHubhttps://github.com/chnm/serendipomatic/issues/78#issuecomment-26946007 .

amrys commented 11 years ago

Hi folks,

Sorry for the delay -- finally got around to sorting out my GitHub password (something that apparently feel straight out of my brain when I was in Finland). I've attached the image here.

a. bibtex-problems