dentarg / hubot-url-title

:crocodile: Returns the title when a link is posted
https://www.npmjs.com/package/hubot-url-title
4 stars 10 forks source link

ignore pdf file links #12

Closed impca closed 8 years ago

impca commented 8 years ago

Added pdf to a list of ignored files. Hubot will no longer try to get a title of pdfs.

pchaigno commented 8 years ago

I'm not sure that's the expected behavior :confused: Contrary to other extensions in the white list, PDFs have a title; I'd expect hubot-url-title to be able to extract it.

dentarg commented 8 years ago

I'm still traveling but ignoring PDFs seems reasonable until we have the feature @pchaigno talks about

@pchaigno are you familiar with the PDF format? do they in most cases have something equivalent to the HTML <title> tag?

pchaigno commented 8 years ago

I'm not sure all PDF documents have a title. In any case, I think mozilla/pdf.js could extract it (via PDFDocument.documentInfo). I'd expect hubot-url-title to try to extract a title as much as possible but it might not be within the scope of this package ;)

dentarg commented 8 years ago

I think it would be a cool feature :)

impca commented 8 years ago

That is an interesting idea.

I am kinda torn between implementing it here or creating a separate hubot-pdf-title.

pchaigno commented 8 years ago

I am kinda torn between implementing it here or creating a separate hubot-pdf-title.

I think it would be better to implement it here, because if you create a separate package, you'll end up re-developing some of the same logic :/

dentarg commented 8 years ago

Let's ignore PDFs for now, as we can't get the title from them. If we do #14, we will just remove the ignore.